BG - Crunching the Top 10,000 Websites' Password Policies and Controls - Steve Werby

Name: BG - Crunching the Top 10,000 Websites' Password Policies and Controls - Steve Werby
Uploaded: 2017-03-22
Duration: 55 min 33 s
Description: BG - Crunching the Top 10,000 Websites' Password Policies and Controls - Steve Werby Breaking Ground BSidesLV 2013 - Tuscany Hotel - August 01, 2013

BSides Las Vegas55:3311 viewsPublished 2017-03Watch on YouTube ↗

About this talk

BG - Crunching the Top 10,000 Websites' Password Policies and Controls - Steve Werby Breaking Ground BSidesLV 2013 - Tuscany Hotel - August 01, 2013

Show transcript [en]

So, I'm here to talk today about a presentation called Crunching the Top 10,000 websites password policies and controls. But before I get started, I have a disclaimer that I'd like you to read and agree to.

And there's my real disclaimer. The last line is really the most important piece. Please don't sue me or anyone I do work for. Yeah, a little background about myself. So, my name is Steve Worby. I've been infosc for about 14 years. Until about a year ago, I was a CISO. Held that role at a few different organizations. Prior to that, my roles were technical. I have some degrees. I have some serves. I've given some presentations. Currently, I work for a Fortune 200 company that during the day I do security architecture for. And outside of normal working hours, I also work as a security researcher and consultant. Thank you. Okay. So, I'm a real big fan of

LinkedIn, but I'm not a really big fan of the LinkedIn endorsement. So, I thought I'd show you the types of uh skills that I get endorsed for. If you look at this list, um there's a few on here I'm not really familiar with. Uh so, I obviously decided to accept all these endorsements. I'm most fond of bass songs. I'm very knowledgeable of that arena. Can we see some hip-hop dancing? Is is that one of the ones on the list? Yeah. Oh, it is. Whiteboarding. That that might be one that I don't have the skill for. White white white boarding. I'm act I'm very good at that one. Uh water boarding, white boarding, anything that ends in

boarding. Surfboarding. So, a little background about myself. I like to drink craft beer. Um I have one in my hand. I haven't signed my own name the same way twice in about I guess about three and a half years now. And so I do things like sign the name of the conferences I'm presenting at or um my wife may have Oh, there she is. So my wife's in the front row and she hates this. So she's gotten a point where if we go out to eat at a restaurant when the credit card bill comes and I'm about to sign it, she decides she's going to leave the restaurant because she doesn't want to be around there for it. So she

told me one time recently to sign it normal. So I sign it normal dollar sign wice request. Um I'm pretty active on Twitter. Uh, I like to think every once in a while I have good security content, but mostly people seem to like the jokes I make and funny things I say. So, there was some talk at Black Hat about the Constitution a day or two ago. So, I wanted to put the Constitution up so everyone could say that they've read it. Um, I apologize if you can't read it. It's a little small. In terms of ground rules for today, I'm going to try to move pretty quickly through the material. Feel free to jump in and ask

me any questions or make comments as we go. If it looks like I need to speed things up, I might have to hold the rest of the questions until the end of the presentation. But don't be shy. Jump in. Okay. So, I want to back up to about a year ago. So, I mentioned I was a CISO for about six years. A year ago and two days, actually the day before um Black Hat and Defcon, I resigned from my position as a CISO at the organization I work for. And the very next day, I flew up to Vegas, spent eight days here. When I got back, I did a lot of this. So, I spent a lot of time in San Antonio

around the pool and I did a lot of thinking about what I was going to do next. And my plan was to uh do consulting and I did that for almost a year before I moved into my current role. And when I was sitting around the pool drinking beer, um one of the things I thought about was what could we do in the information security field hasn't been done? What what research has not occurred? And I started to look at mostly technical things. I was doing a lot of work around password cracking with uh cloud services like Amazon EC2. Eventually, I kind of put my CISO hat back on, decided, let's look at some things that are not quite as technical.

One of the things I looked at was uh password policies and other security controls for protecting user accounts. And I looked to see if there had been a substantial amount of research done in this area, and I really didn't find any. It's possible there is some. Maybe I'm just not really good at searching for it. So, if you come across any I may not have seen, definitely point me in the right direction. So, I decided to give this uh idea a whirl. And the two things I really wanted to answer or answer or attempt to answer were are sites doing a good job at protecting users accounts and how much control do security conscious users have if they actually

want to protect the authentication to the systems they use or other controls around those user accounts. What can they control versus what the sites what can the sites control? So then I got to thinking about how I go about doing this research. And so question one was what sites should I look at? There's obviously millions of websites and I don't have time to look at them all. So I want to look at a subset and what kind of data was it possible for me to collect and how would I go about gathering the data? So I I like to try to keep things simple although sometimes that gets out of hand and I kind of go off during uh down some

rat holes. But I I I tend to like to use tools and services I'm very familiar with. And so one I'm familiar with or have been familiar with is Alexa. The way Alexa works, which is now owned by Amazon, is there's a toolbar that you can install in your browser and it collects information about the websites you visit. It's reported up to Alexa and they use that information to gather data such as uh how often you visit sites and uh that type of thing. So, how many visitors and page views are the kind of things that they're looking at drills down a little bit more detail than that. One of the things they do is they

develop a a ranking of websites best based on some secret sauce. It's not shared. So, it's a three-month moving average of daily unique visitors plus page views to a a website. What that particular algorithm is, I don't know. But for my purposes, it really didn't matter. I just wanted a list of fairly popular websites so I could start to look at them. Out of curiosity, is anyone here ever actually run the Alexa toolbar? Okay, so this is the second time I've talked about this and I I've yet to meet anyone that actually is willing to admit that they have the Alexa tool bar installed, but allegedly it's installed on millions of computers worldwide. And so there's pretty substantial amount of

data that's collected from it. So with Alexa, uh you can actually just go to their website, type in the name of a particular website, and it will tell you what the ranking is. Uh generally, unless it's barely used, you'll get some data back. Um, I threw a few URLs up here for you to look at. So, if you want to look at the top 500 global sites, there's a URL you can go to. If you want to see top 500 sites in the UK versus Brazil, you can drill down from that level of detail. But the really nice thing is they make a zip file available that's updated on a daily basis that has a top million

global websites, which is pretty cool. So, I decided I was going to start with that, but I wasn't going to look at the top million. I was just going to look at the top 10,000. So, I want to give you an example of what this data looks like. So, this is the top 50 global websites, uh, plus number 10,000 just to show you what would be at the bottom of the list. And I have no idea what that website is. I I did look at it at one point, but I don't recall. But, one of the flaws or issues I saw with this list, being that I'm a US citizen that primarily speaks English, is there are a number of

websites, even in the top 50, that are international, are not in the English language. You know, um, the number five website's a Chinese website. There's multiple versions of Google. Um there's porn sites which is you know not necessarily a bad thing but you know I'm told that number 45 number 48 probably a few others maybe uh adult content. I don't know for sure I even attempted to validate it. I had some other people validate it for me. So then what I really wanted to do was look at the top 10,000 US websites and that's much much more difficult to come by. There is no downloadable version of the free downloadable version of the top 10,000

US websites. However, Alexa, now it's owned by Amazon. There's an Amazon API that uh for 0.025 cents per query, you can make requests to the API and you can do it to the granularity of I I only want to see websites that are used uh heavily in a particular city or country. So, I narrowed it down to the US. They have an API. I I grabbed um the API reference manual. They have code in four different languages in their uh code repository. So, I I've been programming in PHP and some other languages for a long time. Of the four listed, PHP is the one I'm most comfortable with. So, I chose PHP, modified the code to meet my

needs, and wrapped it in a bash script because you can only make a call for a 100 records at a time, and I needed 10,000 records. So this is what my code looked like. Um I don't actually show you the the code as PHP, but uh the amount of modification I had to do to the code that was in a repository was fairly minimal. Uh I did make a mistake. So there's a bit of a lesson learned here. In my haste to utilize this, I performed a test that just made one iteration. So it cost me 25 cents uh iteration for 100 records. And it gave me what I wanted. Then I wanted to grab

all 10,000 records and it enumerated all 10,000 records. But when I did a wc-l to count the number of lines in my output file, there were only something like 400 lines. So I had an error in my bash script. And so I paid 25 bucks and only got like 4% of the data I wanted. So don't make that mistake. So after I got the data back, here's what the top 10,000 websites for the US look like. So you can see there um are some porn sites in there still and that's again fine and there are websites in the top 10,000 that are based out of the US and are potentially in other languages besides English. But it was a

whole lot cleaner and easier to work with. And I don't want to say that I'm not concerned or interested in websites outside the US that might not be in English. It's just I can't really easily work with those myself. So I decided I had to start somewhere. So I started with this list. Okay. So, so I went back to the pool to do some more thinking and uh all right. So, I did some more thinking and I actually brainstormed. I think I'm missing a slide here unfortunately. Uh I made an update and my QA was a little bit poor. I was going to show you a list of about a 100 different attributes that I was

interested in collecting. So, just imagine a 100 different things about website security controls. minimum password length, maximum password length, whether the site uses SSL for authentication, uh whether there are session cookies and how well they were protected, etc. Just visualize that. And now let's look at this graphic. So this graphic is to show you that the information overload. I decided to step back and start with a smaller list that was prioritized. And so I worked with initially about 20 attributes. and and my process was to take those 20 attributes, go visit some websites by creating some accounts. Uh attempt to create an account, make some modifications to my password, inspect information about whether SSL was being used, look at cookies, do a

number of things like that. And based on that, I came to a conclusion that some of my attributes were not quite as granular as I need them to be. And I identified some additional things I had not originally identified that I wanted to collect data for. So, just to give you an example, uh maximum password length to me seemed like that was a pretty triv trivial thing. But as I dug into it more, I realized it wasn't quite as trivial. In some cases, I'd visit a website and it might actually list that there is a maximum password length of 12. But it might turn out that I could only actually enter a password of up to

eight characters or I might be able to enter one that was greater than 12. So, I decided that there's more information I potentially wanted to capture. Some websites didn't actually even tell me the maximum password length until I entered one that was too long. So, I started to collect more information about these attributes. And uh just another example is is something I originally wasn't going to look at was whether the password was emailed to me automatically after account creation. Yeah, really important. Very important. And so that to me was a sign that likely that password is probably being stored in plain text because if the folks behind that website would send me the password in plain text, I didn't get a warm and

fuzzy feeling that the password was hashed. It was most likely not hashed. If it was, it was a pretty weak hash that was reversible. It was probably not encrypted or obuscated in some way. So that told me a little bit more information. I decided that was a piece that it was interesting to capture. And so I started to know some patterns. There there were sites that weren't in English. There were websites that you didn't have user accounts which made made sense. There were websites that you couldn't register on. So for example, some bank websites you already have to be a customer to you can log in but there's no way to create an account. And

there were websites that a lot of websites actually that had the option to use third party third party authentication via OOTH or another another service. So you could log in or access the account via your Facebook account, LinkedIn account, Instagram account etc. So then I got to a point where I started organizing the attributes I want to look at and I I I decided I noticed that they kind of fell into some common categories and there were seven real categories that they fell into. One was password strength and on the right I just show you some examples of uh some of the attributes that that fall in those categories. It's not a complete list. Uh

unauthorized access prevention/detection. So, we're talking two-factor authentication. Whether after you log in, it tells you the last login details. So, uh the last IP address and date and time the user logged in would be a way for the user to identify potentially that there was an unauthorized access to their account. Password recovery, which I actually consider authentication bypass. Uh from a user perspective, it's password recovery. It's a way to get into your account if you forget your password. The way I look at it wearing my security hat is it's a way to bypass authentication. And so moving on down the list, uh probably the the piece that I actually looked at the least was the breach and

vulnerability history. Uh I have a hypothesis that uh websites that have had a history of their passwords being leaked or do other things like for example don't work well with researchers that disclose responsibly. may not have security controls that are as good as websites that don't have data breaches and work well with security researchers. So that was a piece of information that I attempted to collect. So after I kind of made some iterations through this process, I decid I determined there was an efficient workflow. So for example, if I wanted to identify the minimum and maximum password length, you can't actually easily do that by creating one password. You actually have to create a number of

passwords. Uh so you actually really need multiple accounts to do that. You might think you could get around that just by going to the change password functionality. But here's what uh another person actually is working with me found. We identified a site that had a diff different set of password policies for account creation versus password reset versus a password reset initiated by code sent to an email versus an attempt by the user to proactively change their password. Three different password policies. So, uh it wasn't quite as simple as one would think, but I determined an efficient workflow for dealing with the data. And then I sat down to estimate how much work it would take to collect data for these 10,000

websites. And the first piece really is to determine whether there's a way to create an account. On average, it took me about 20 seconds. And then I had sort of two buckets of attributes I wanted to collect data for. Sort of the all you can eat dozens upon dozens. And that took me an average of about four minutes. Two minutes for some sites, eight minutes for other sites at the other end of the spectrum. And then for a smaller subset of the attributes, it took me 1 to four minutes with an average of two. And based on my early investigation, it looked like about 70% of sites would be in scope because they were in English. They had an account

login uh account creation capability. So I started to do some math. And after I did the math, I determined it would take me about 17 hours just to determine whether these websites had a way to create an account and another 234 hours to actually collect the attributes I wanted to collect values for. And so that's 251 hours total. This is something that I'm not doing uh within the context of my day job. I try to have good work life balance. And so I felt like if I allocated two hours a day, it would take 126 days straight. And that was a long time. And this ignored the semi-automated data collection I wanted to do which involve things like brute

force attacking websites attacking my own accounts uh not other people's accounts and doing things like inspecting SSL certificates etc. And so I wasn't even including that in the map and so that was too slow an effort and I decided to to think about a different way to do it. So, I went back to the pool and

I went back to the pool and I decided there were a number of ways to approach this. And one of the ways was I could hire a part-time worker and I didn't have the funds to do that. Uh, still would cost me thousands of dollars to do. I could try to coach people I know into uh providing me assistance for positive karma and some recognition. I decided to pursue that. I also was familiar with a uh crowdsource a paid crowdsourcing website called Amazon Mechanical Turk and I hadn't used it. So I decided to give that a try. But before I I did any of that, I decided I was going to break down the data into a

meaningful set of blocks of data and figure out how I was going to work with it. So I decided it was probably more important to focus on collecting this data for the top 100 websites versus the bottom 100 and the top 10,000. So I broke the websites down into these rankings in these particular blocks. Then I took the more important blocks and put them in a tier where I wanted to look at more attributes and the less important blocks or or tiers I want to look at less attributes. Then I randomized the data. It may not be cryptographically sound the way I did the randomization, but it worked for what I was trying to do.

So one of the things I I did was I I worked with unpaid volunteers. And I actually didn't do this part until just very recently. And what I did was I I just utilized my social media accounts and just threw some posts out there and said, "Hey, does anyone want to help me with this project?" And on Twitter, I actually got a pretty good response. I had probably 25 or 30 people responded and said, "I'm interested in more info." So I I sent him a really long email with instructions and I sent him some data and said, "Hey, if you have a chance, great. Return it to me. If not, no, no worries." I tried Facebook as well. I

tried LinkedIn and then I solicited some of my family members. Um, interestingly, I didn't have a very good response with LinkedIn and I have a pretty substantial professional network. So, I don't really know what that says about uh LinkedIn. So, Twitter, at least for me, was was better. And the individuals responded largely were in IT or information security, but some of the people responded were not in a technical field whatsoever. Um, uh, which was okay. I tried to construct the research in such a way that someone who didn't have a technical background would be able to actually collect the data for me. So, one of the things I wanted to do was I wanted to have some assurance that

people weren't just making the data up or just not following the instructions properly. So when I broke the websites up into chunks of 20 sites, I assigned everyone a unique uh block of 19 and then I assigned them a control website that was the same as a website assigned to somebody else. And I'll talk a little bit later about why I did that. And I'm not actually going to go through all the questions because later on I'm going to show you some of the analysis. But what I did was I just created a spreadsheet and for most of the fields I had drop down boxes to reduce user input error. for some of the fields I didn't

and I'll talk later about what some of the ramifications of that were. There's actually at least a couple people in the room that some collected some of the data from me. So if any of you have any at least three people so maybe you have any input uh feel free to jump in at any point as well. I'd be glad interested to hear um what your experience was with this process. I'm just going to jump through the spreadsheet. uh again I say 85% of the questions had drop-own boxes and uh so what I show here is a question the first column I showed the user in the second column the list of values that were in the drop-own

boxes just to make it visually easier for them to see and the third column was where they actually responded

so I got some data back from the unpaid volunteers and I did two things to validate the accuracy of the data And statistically what I did was just a very rough pass at doing this. I just wanted some assurance that the data was pretty good before I started sharing my analysis. Uh I mentioned I had the control site. So I had some websites that two to five people actually provided data back to me on. And I compared the results for the data it was responded back to me. And if it all lined up, I felt pretty good that those people knew what they were doing. I also picked out randomly some other websites and manually did my own testing to

validate and based on that I concluded that it appeared that about 92% of the results match either my own testing or multiple volunteers had identical data. So I felt pretty good about that. uh 8% is a little higher error rate than I would have liked, but the flexibility of the research allows me to then have additional users or volunteers or paid folks do more research and validate the data. So that's okay. So it's a starting point. Anyone know what this is? Mechanical Turk. Mechanical Turk. The mechanical turf was a an awesome creation I think in the 1700s where this uh machine would play chess with you. So you play on the board and the machine uh moved the

pieces or actually I believe the machine I can't recall actually if a machine moved the pieces. Yeah, I believe that's how it worked. So, um, everyone thought this was great, this this machine could do this, but in in reality, what it was was there was a small person inside the box and there were magnets on the the bottom of the pieces. So, the person underneath could see what was going on and he was very well um he was a very good chess player and so people were amazed by this. So, then Amazon came along and created a service called Amazon Mechanical Turk in 2005. Has anyone used this to actually perform a job? Okay. One, does anyone

use it to post jobs? Okay. So, I I hadn't done either of those things. So, uh I decided I was going to use it to perform some jobs. The way this works is people or organizations post jobs out there. They're generally things that are very easy for a human to do, but may be difficult for a machine to to do. So, for example, you could be presented with four images and be asked to identify which ones are humans and which ones are pets. uh there is technology that a computer can do that but it's not going to be as accurate as a human. There are also uh ways to go and do things like I

want to identify all the street addresses, phone numbers and hours of operation for every bank in the US. I could do that manually and it' be very difficult and timeconuming. I could write something to scrape a bunch of websites or I could pay some people that are outside the US are willing to do work for about $3 an hour or live inside the US and like to do this kind of thing to make some extra cash and keep their brain going while they're watching TV. So that's the kind of thing you can use this for. So I I did some work myself and I found some jobs out there that had me go to websites and uh perform search

queries and collect all kinds of data that was trivial to to collect and I was paid between like 5 cents and 25 cents each little job I did. So this is not going to be a good way to make a whole lot of money. So in terms of the terminology, uh the people that do the work are called Turkers. The work you do are called hits, human intelligence tasks. But we don't really need to worry about the terminology too much. So this is actually somewhat difficult to do on Mechanical Turk itself. So I came across another resource called smartsheet.com that a peer of mine had been working with. He told me it was a collaborative

website where you can essentially work with a spreadsheet, have multiple people uh assign different rows as workflow and other cool stuff. But really what what turned me on to it was since I think about 2009, you could put data into this this system and you could just send it directly via a a technology called smart source directly to Amazon Mechanical Turk. So I saw as a way better way to do it because the data sent um you can have it come back to you and see it in real time populating the web-based spreadsheet. You can have data emailed to you and so it was a whole lot better. The trade-off is that you have to pay a

little bit of a premium on top of you what we normally would pay to Amazon Mechanical Turk. You probably can't really see what I have up here based on the font size, but this is actually two screenshots. The one that's behind are just the column listings for the questions that I asked, and the one that's in the the front is the interface for the uh smart sourcing feature where essentially um I pick the fields that I want displayed. The fields that I want responded to. So the ones that I wanted displayed were just the site name. Um I wanted the user to see the site I wanted them to collect data for. And then there

were a number of fields where I wanted them actually to populate data by uh selecting a value from a dropown box or entering some text. I told it I will spend 25 cents per hit. I tried this with all kinds of different pricing and attributes and got different levels of um value from doing so. I I told it that if somebody grabbed a hit, they essentially lock it. And I said, I'm only willing for them to be able to lock it for an hour. And if I put out about 250 records, I wanted the this only be out there for about 6 hours. So I did this and just to give you an example in

this particular time, put it out there for about 6 hours. About 120 of the records were actually populated. 130 weren't. not a big deal because I could sort the data that came back and then grab it and send it back out mechanical turkey again using the same pricing or different pricing. So um and this proved to be a very easy way to collect data. I also could specify uh the quality of user I wanted to perform the work and there's a metric within Amazon Mechanical Turk called approval rating. So I said I wanted people that only had approval ratings above 95. I could have gone down as low as 85 or as high as 99. The trade-off is

the higher the approval rating, the higher I would most likely have to pay before those people would perform the work. The lower the approval rating, the poorer my data was going to be that was going to be returned to me. Uh the first time I did this, I actually screwed up and the site field was not populated. So, people just started picking random sites and performing the work. Um, one of these people actually contacted Smart Sheet and told them, "I think something's wrong with the form." They contacted me. They removed the job and helped me figure out how to do it more properly. This was another case where I lost a little bit of money.

I also identified Yes. for science. For science, exactly. So, I also determined a better way than collecting the data via spreadsheet. Smart Sheet also had a way to create a web form that I could have sent to my volunteers, a URL, and they could have just filled it out on a web form that was a whole lot easier to work with, and it would have either populated the same spreadsheet I was using for Mechanical Turk or a separate one. And with the spreadsheets I sent out, one of the things I didn't do is I didn't take in account people use different spreadsheet applications. So, I knew it worked in Open Office and I knew it worked in Excel. Some people use

numbers on Mac. Uh I didn't have a Mac and so I didn't realize on the Mac when the data was pulled into numbers all my drop down boxes disappeared. So those people ended up uh responding with text that didn't match up what would have appeared in the drop down boxes. So I ended up with some data with harder cars. I went through a similar process to validate the data that came back from Amazon Mechanical Turk. One of the things I did um and again there were a lot of lessons learned here. One of the things I did was I assigned the same work to multiple people so that if for example I assigned the same website to

three people and all three of them gave me the same response, I could feel pretty good that the data was valid. If two of them gave me the same response, it might be user error on the part of number three or it could be because my questions were hard to understand. Uh I had the option at that point of saying I didn't want to pay user number three because they did a bad job. In this part particular case, I decided I was going to pay him all no matter what because the reality of it was the complexity of what I was asking him to do was substantially higher than what you would normally find find on Mechanical Turk.

In fact, the folks from Smart Sheet told me that uh they almost had never seen a request sent to Amazon Mechanical Turk that was as complex as what I sent out because usually these are questions like look at the scan business card and put the phone number into a spreadsheet. So, it's pretty trivial stuff. And so I learned from this that you really have to make it pretty simple. And I I I would do it a different way and will continue to do do it a different way because I still have more research that I need to do. So result of this was the accuracy. The data was not as good. It was about 84%.

So I started look at how much data I was able to collect. I collected data uh from 100% of the top 100 websites um both from my volunteers and I also looked at those websites myself. The volunteers were looking at less attributes and I was looking at a whole lot more. And then for each tier uh of websites as we move further down the rankings I collect I collected we collected a lower percentage of the total websites. But I was comfortable with that because it still allowed me to extrapolate what uh behavior and attributes would be associated with the security controls around those websites. I I don't believe for example with a tier 2 that if the additional 9% of

websites the data wasn't collected for are added to the scope of this that the statistics are going to be substantially different. Um there may be some anomalies but uh I really don't think it's going to be that much different. So here are some of the reasons that uh accounts didn't have data collected. So backing up I mentioned that I made initial assumption I could collect data for 70% of websites turned out to be closer to 59%. I don't I haven't looked into too much detail why people selected other but you can see some of the reasons why people didn't select data uh who weren't able to actually get bring it get any data back. Now some people

didn't do it because it was adult content and I gave them that that option. Other people were quite happy that I was asked to go to adult content websites. So an unnamed person in this room told me that he was he or she was asked by his or her spouse not to look at those websites again. had to clear his or her catch.

Whoops. Cat tail the bag. So, we're going to start to get in some of the findings here. And these findings are for all the websites were looked at. So, this was a few thousand of the 10,000 websites. I have the data to to compare things more granularly, like to look at how the top 100 performed versus the bottom 5,000. I haven't included that within this research in the the the presentation itself but I am going to be making that data available later on. It's just uh there's a lot of data so I want to keep it pretty high level. So this this actually surprised me that 40% of the websites had a method for logging

in other than by creating an account directly on the website. So these are your Facebook, LinkedIn, Instagram, Twitter, Yahoo etc. um authorization systems. Does that number surprise you? Yes. So you would have thought it might have been lower than that? Yeah. Yeah. All right. So then then I looked at minimum password length. Um and so so during this testing, the first thing I asked people to do was to try a one character password of the number one. uh with the feeling that if that worked, we knew the minimum password was one character and that there was no composition requirement because no one really has a complexity requirement of make it all numbers. So then I also

asked them to move on to testing 25 character passwords that were just one through zero one through zero one through five because if that worked that told me also that there wasn't a composition there wasn't likely a composition requirements and but a 25 character password was accepted. So what do you think that that was discovered about minimum password lengths? Anyone have any thoughts before I share the data? Six. What? So you're thinking six was the most common? Five. Oh, four. Four. Four. Four. Okay. So, let's take a look and see. So, the most common was actually six. Uh, what blew me away was out of these thousands of websites, there was not a single website that

required a minimum password of nine or more and 11% of the websites allowed a one character password. So, you can see there there's some common trends here. Five and six character passwords are still quite popular. Yes, there are some websites that don't even require password.

So, I didn't actually look at that. And so, that's a great point and I think that's something that probably should be looked at in the next uh iteration of this. And so, uh that's the kind of thing that I want some input on. I didn't actually have anyone test that at all. So, it's it's actually possible that on some sites um a zero character password would have been permitted, but that wasn't tested. Yes, it's also not too surprising that a lot of them aren't over characters user experience numbers. So, that's not your maximum threshold. So, that's why you got zero%. That's a very good point. And so let's keep in mind these are websites visited

by people that are public websites. So if we were to look at corporate websites, I believe that a lot of data would look quite different. And that's one of the things that I want to extend this research to later on is to look at corporate websites. It's going to be trickier because uh I can't just get an army of people to go out and look at corporate websites and I have to identify or find volunteers that actually have access to those corporate websites and are willing to share that data. So, what about maximum password length? Now, keep in mind, uh, I did have some users actually tried passwords way longer than 25 characters, but what I

asked them to do was try 25 character password and if that worked, don't try anything longer than that. And the reason I chose 25 was if if we want to advocate passphrases that seemed like a pretty substantial length would facilitate passphrases. So, what do you think we found about maximum password lengths? 12. So, you think 12 or 15 was pretty common? So, uh, 54% of websites did allow a 25 or longer password. 25 character or longer password. In fact, I had some people tried like 8,192 uh character passwords and responded that uh the the values seem to be unlimited. You can see this is one of the fields there wasn't a dropown box for. So, these are actual text strings

that people entered which uh I had to uh process this data a little differently. So, everything worked out. But um I I really appreciated the enthusiasm of the people were doing the work with me. Um I I found it interesting. There were no maximum password lengths between five and nine. There were a number of websites that allowed four character passwords. And the ones that I personally found or have inspected so far were all four-digit numeric pins and I'm going to talk about brute forcing one of those later on. Sorry. are also um like where where you can input a 8,000 character password which gets truncated. Yeah. So yes that that is true and it that is one of the set of

attributes that was looked at for the top 100 websites. I'm not sure within the presentation but there were any number of websites that in fact um the per the person that looked at fedex.com I believe maybe in the room um does that person want to selfidentify? I can Okay. Did you look at FedEx.com? So, do you recall what you found with FedEx.com? So, I I believe it was FedEx.com, I believe, had don't matter really what website it was. There was a website that someone found. They were very enthusiastic about the fact that um it said there was a certain minimum password length, but I think it was eight or six, but they tried to create a

six character password and then told them, "Sorry, it has to be at least eight." So there were some inconsistencies like that. There were a fairly large number of websites I looked at in the top 100 that allowed me to create a very long password. When I went to log in, um there was a max length attribute set in the HTML for the input um field that truncated it. In some cases, I could just uh tell my browser to ignore that and have it send the longer password, which is kind of a cool feature because if an attacker is going to perform an attack against the actual web interface and they don't do that, um they may they may not actually be able

to gain access to my account. But that occurred and the opposite occurred where there there were there were a number of anomalies in both directions with uh inconsistencies like that. All right. So, I want to look at sort of the user experience. And two of the questions we looked at was when was the user told there was a minimum password when? I mean, when was there when were they told there's a maximum password left? We looked at like were they told before they even entered one? Were they told once they entered started to enter a value into the field? Was there an icon they could click on for more information? Were they only told when

they spent the time to think of a really great password and submitted it and then said, "Hey, you suck. That password doesn't work." even though I didn't tell you what the criteria was. So, what do you think was the most common scenario for minimum passwords? Yeah. So, that is the most common. So, almost half the websites didn't tell you what the minimum password length was until you actually entered a password. So, I imagine that's pretty frustrating to users. I know my testers were somewhat frustrated by that. And when we move on to the maximum password length, it's interesting because it looks a little bit different. The vast majority of the time the user was actually never told what the maximum

password length was. And that's especially troubling if you advise them to create or they want to create a passphrase or something that's longer. They want to take something and create it random using one of their password generation tools and they create it and they paste it in and they may paste it in. When they create it, there might be a max length at that point. So it might get truncated gets trunk truncated there and gets truncated when they log in. They may never know that their password is shorter than it really is. If it gets truncated one place versus the other, then there's two different issues that result from that. Then I looked at password composition

requirement. Only 25% of websites had a a password composition requirement. So what what we looked at there was was it a combination of letters, numbers, special characters, etc. Did it provide any kind of guidance in terms of here's how you create a password? Here's how you should create a password even if it wasn't a good one. And I kind of overlapping with another question we looked at. Uh, does the login destination page use SSL? And I was really clear about this though. I believe some of my data uh, a lot of the data was bad actually was in this particular area because some of the volunteers on Mckendle Turk didn't really know what SSL is. And the way it

was framed to them was https. And I was very explicit about the fact that I was talking about not the page that they started to enter the password in, but the page that it took them to after the the password was submitted. because in some cases you'll visit a website that does not it's not over SSL but it submits the form post to an HTTPS website web page so that's over SSL I'm ignoring at this point whether it had cookies that were shared over plain text or cookies that were trivial to enumerate etc just was SSL being used so does anyone want to take a stab at whether the login pages used SSL 50 any higher numbers any lower numbers

38. Someone's probably in the ballpark. Oh my gosh. So, I I think my open office impress skills um failed me here. So, what I'm going to do is at the end of the presentation, I'm going to I can do that right now. So it's 53% used SSL, 36% didn't, and 11% websites the response was unknown. And that's largely because uh Ajax technology that you would get potentially a popup to enter the P username and password. It would actually take you to a destination page. And my volunteers, I I told them if you encountered that, don't bother to try to find out whether SSL is used. I did that for the top 100 by using uh

tools like Firebug, which is a Firefox extension, which allowed me inspect all the communication. There's there's a number of tools that you can use to do that. Was the password automatically emailed after account creation? Uh anyone want to take a stab at that one? It's actually quite a bit lower. And so I this was like one of the few things I felt really good about and very few websites were actually doing this. My wife did come across a website that did something pretty stupid. Uh it was all right. It was Nickelodeon's website, Nick.com. So, she created an account and it asked her her security question was what city do you live in? So, she provided a real city which is trivial

for an attacker to identify and she went to the forgotten password tool and it popped up on the screen. What city do you live in? So, she entered it and immediately popped up her password on the screen and said your password is 1 2 3 4. Nice. So, uh, we did learn that the, uh, accepted a four character password and it was probably stored in plain text. So, nic.com appears to have very poor security controls. Also looked at whether the site discourage users from using the same password they use elsewhere. You This is a pretty basic tenant of password policies. We we tell people over and over again, don't use the same password for site A you use on site B because if

it's compromised and leaked, someone that finds that could gain access to the other site, right? So, I was expecting or hoping that websites were telling people that, you know, helping us security professionals or enthusiasts spread spread the the message that that's a bad idea. So, let's take some guesses here. Okay, so we're really pessimistic on this one. So, it's 4% of websites All they need to do is put up a oneliner that says please don't use the same password you use somewhere else. But uh to your point potentially that could cause some usability issues because if they have to remember pass different passwords everywhere. They may forget the password and get frustrated and it may result in just customer service

issues. So uh from the site's perspective there might not be a reason for them to actually share that advice. So, I didn't attempt to really uh draw conclusions based on the data at this point. I've just collected the data. But that's that's a theory of mine. Does a site educate users on how to create a strong password? Now, I didn't ask anyone to identify whether they thought the education was appropriate. So, if the site said a way to create a strong password is to take your pet's name and enter that in, I personally consider that yes, the site did educate users on how to create a strong password. An additional attribute for the top hundred I look at was did I

actually think that was a good strong password uh set of guidance. So what percentage of sites do you think did this three? All right. So less than 10. All right. You know what? So this is actually proving that crowd sourcing is pretty effective because you largely been in the ballpark with these answers. Is there a password strength meter or indicator on the site? So an indicator to me meant like a a check mark versus an X, good, bad, poor. Um 78% of websites did that. So that's fairly low. In the top 100, I did look at that and a much higher percentage of sites in the top 100 implemented this particular control. Is it possible to have an email sent as

part of the forgotten password process? 85 8590 8590. It was a fairly high number. 76% should have showed up that way. So, one of the things I ask myself and I ask you to think about is uh if almost every site does this, do we really need security questions? Should we have security questions and this capability? Security questions for this capability. And I'm not saying either of these are are the best ways to allow a user to uh regain access to their account, but it's just a a question that I I ponder. Do they use two-actor or two-step authentication? Question. Yes.

How do you reconcile that?

Oh, okay. So, yeah, only 2% of websites did that that were within the sample set. This is uh the this adds up create creation versus recovery time. Creation versus. Yeah. Yeah. Exactly. So the the first question was did they send you it in plain text even though you didn't ask for it went immediately or sometime very shortly after you created the account and this was you entered your email address in because you forgot it and it sent it to you. So that's that's where the uh difference is in these two numbers. Great question. Thanks for helping me on that. What about two-factor authentication? 5% 5%. What What about anybody in the room that might work for a two-factor

authentication company? Anyone in the room want to take a stab at that? Less than one. We're reading a niche mark. Zero. Yeah, I say two%. 2 3%. I I wasn't expecting to be anywhere near as high as it would be in the corporate world, but yeah, 5% is what we came across. Is it optional or mandatory? Like so option in this case was an option. So, uh, with the top 100 websites, I look at whether it was required, uh, but only for the the larger entire study looked at whether it was an option. What percentage of your sites were financial? Haven't looked at that yet. And I'm going to talk about the future research and how I'm going to go about doing

that. I intended to gather that data and I haven't for some of the websites. the percentage of websites that were in the top 10,000. Uh I couldn't tell you what that number was, but I can tell you that because I didn't ask people to collect data about the financial websites that they did business with. If I assigned them city.com, they might have a city account or they might not. So, uh the financial websites I got very little information back from. Um I do have a large number of financial websites that I utilize like well I'm not gonna name a very large number and so I looked at those specifically and uh that that takes us through the bulk of the

questions that were common to all three tiers of the uh websites that that were looked at. Um, for some of the other tiers, I looked at things like was the uh last login time, geographic location, and IP address presented to the user after they log in. Was that displayed to them immediately after logging in? And was it above the fold or below the fold? Did they have to peck around the site to find it? So like Facebook for example will show you information on machines of yours that have uh open active login sessions but it doesn't tell you that after you log in. You actually have to know where the heck you're going to come across it. So I

believe a typical user of Facebook probably doesn't even know that capability exists and wouldn't even think to look for it. So I was looking for a graphic on brute forcing and came across this. I didn't know this was a movie. I think it's from the 40s. So, it's a movie I think we all we should all watch because um one it's called Brute Force and two it has B Lancaster without a shirt and you know what more do you need than that? And if you look at the the line at the top men cage driven by the thought of their women on the loose it sounds like it might be a really good movie. So there was a uh top

200 e-commerce website I looked at. I created an account and it only let me use a four-digit pin. So I I thought that was fascinating. So, I created uh multiple accounts and I started to go off track and do things like look at whether I could change data. And long story short, I found out I could uh change the credit card on account two from my account one. I'm not going to tell you how I did that, but I believe I could probably do that for every single user on these e-commerce website. So, eventually I'm going to tell them. But the bigger deal was I decided it would be interesting to see whether I could

brute force my own account because uh you know, four-digit PIN, there's only so many of them. So at first I did that low and slow and I was able to and then I kind of sped up the attack and if they detected it uh they didn't do anything to stop it. So my belief is that I could perform a brute force attack against any user on this e-commerce site simply by knowing their email address. So those are some of the kinds of things that were done that that uh were more automated. But to do the brute force attack, I still had to collect data like uh the uh URL that the form was posted to, what the cookie variables were and

the get variable. So there was quite a bit of work to do. So these were the two questions I really sought out to answer at the beginning. And so my belief, well, I'm kind of curious what what your belief is. Are sites doing a good job protecting user accounts just based on what I shared so far? Who says so? LinkedIn. LinkedIn. LinkedIn says so. All right. So, how much control do security conscious users have? I I think they actually have fair fairly little. On a siteby-sight basis, there is a lot of control that you have, but uh overall, no. Yes. Okay. Yeah. All right. Good. We're good. Um so, all I had to wrap up really is a

couple slides about lessons that I learned. Um, one of the things I learned it was difficult to gather some of the data just because it required creating multiple accounts and if users made assumptions that for example they could just go to the change password screen and determine the minimum maximum password length that wouldn't work because those are potentially different policies than on the account creation screen. In terms of a better iterative way, I think if I was going to do this again, which I'm going to continue this research to collect the data on all 10,000 websites because I don't have all of them yet, is I'm going to use mechanical turd, but the first thing I'm

going to do is is put out a job that cost me less than 25 cents just to ask one one simple question. Is there a way to create an account? Instead of paying 25 cents to find out there isn't a way to create an account, I pay him two cents to find that out. I also uh to your point whether high percentage of these were financial sites. I'm also going to put out a separate set of requests to Amazon Mechanical Turk as ask people to categorize the sites in terms of financial entertainment, pornography, etc. Um I think that should be fairly easy for them to do. even if I have two people uh looking at each site,

uh I should be able to get the data fairly accurately. And so that really ties into keep it simple if you want to have people on Mechanical Turk do this because what I asked them to do was very complicated and that's why I had to pay more and why my data wasn't as accurate as I would have liked. Also, I I spent some time talking to people who are technical and non-technical, get their input, and they provide me a lot of really good value. And so I would say that if you're going to do this, definitely don't do this in a bubble. Talk to other folks. Here are some people I'd like to thank uh that did a

lot of help. Um my wife who is an analyst with my consultancy did a very large amount of work. She's in the front row. So I want to thank Tanya. I appreciate all your help. And all the people in this room that helped me and all the people that are not in this room that helped me. Uh there were many hundreds of records that were retrieved by these people. I know was a not was a non insignificant amount of time that was required to do that, but I I hope they had a lot of fun doing it. So we we're kind of running out of time. We have two minutes. So, uh, I'm not going

to actually ask you to answer these questions, but these are things I want you to think about that if you want to track me down or contact me via social media or email or the phone that I'd like your input on because I want to continue this research and I want to make it of value to you. So, here's what I'm doing next. Um, I'm going to be sharing this data over the web. It's going to be more granular so you can dig into it. Eventually, you'll be able to go actually look up a website and look at what information's out there. Uh I'm also working on a scoring system that I haven't decided whether it's going to be

qualitative or quantitative. It's going to look at those broader categories of controls so that if somebody goes to look at say the top 10 banking websites instead of just looking at where they have branches and what their fees are, they can go to this website and and see what how their security controls look like to determine whether that's something they wants they want to include in their decision-m process. So here's how you can find me. I think we probably have about a minute left for questions but I'll also be available after the presentation. So any questions questions? Thanks. Any any feedback from the volunteers? Any input you want to share? One two things. One the data set the way

you structured it first and you get through it. So the structure that was really from your perspect

understand hey I got one password you know it was just amazing going through that experience I appreciate the input and that's something to keep in mind if you're going to use Amazon mechanical turf usually you want to put jobs out there that don't require a lot of learning curve u there was learning curve for this So, I would not put this out here via Mechanical Turk directly because when you do that, it's one one site at a time is the way that it works. The way I put it out there through Smart Sheet, I could give a bigger block and people could pull multiple sites down because the learning curve for one site would just be tremendous. So, that's all the

time I have. So, I appreciate you uh being here to listen to my presentation. Again, if any of you have any input now or later on, I'd love to hear it so I can take this in the right direction so it can benefit you. Thanks. [Applause]

BG - Crunching the Top 10,000 Websites' Password Policies and Controls - Steve Werby

Related talks