Using Behavior to Protect Cloud Servers

Name: Using Behavior to Protect Cloud Servers
Uploaded: 2016-04-04
Duration: 43 min 25 s
Description: Cloud server adoption has exploded in the last 5 years. Nearly every business is using some kind of IaaS, PaaS platform. Securing these cloud servers is challenging. The ease of access by developers, contractors, web admins and more needs to be balanced with security. Using rule based access securit

BSidesSF · 201643:2522 viewsPublished 2016-04Watch on YouTube ↗

Speakers

Anirban Banerjee

Tags

CategoryTechnical

StyleTalk

About this talk

Cloud server adoption has exploded in the last 5 years. Nearly every business is using some kind of IaaS, PaaS platform. Securing these cloud servers is challenging. The ease of access by developers, contractors, web admins and more needs to be balanced with security. Using rule based access security can only go so far. Once SSH keys and tokens are compromised, an attacker can wreak havoc.Behavior based real time analytics can help create a dynamic fingerprint of an automated service like Jenkins or of an employee. We will show an example of dynamic privilege management to identify and stop insider threats and privilege escalation attacks in real time. See how you can apply next generation privilege management principles to secure your assets.

Show transcript [en]

I'm going to introduce Dr. Anurban. Anorban. That's tough for an American to say, isn't it? Um, so you've been a serial entrepreneur. Good for you. Thank you. Yeah. You have a couple patents and you graduate from UC Riverside. So, you you benefited from an American education. I did. Congratulations. I did, too. And you have some National Science Foundation grants. What are you going to talk about today? Uh, we're going to talk about protecting cloud servers using user behavior. Cloud servers and art artificial intelligence. All right. Perfect. I'm interested. Thank you. Thank you. Yes. Eat the mic. Eat the mic. Eat the mic. Okay. Hello everybody. Um so this talk is going to focus on uh something that is

has proved to be really useful for us internally as a company and for some people that we have worked with and so we want to share our experiences with what has worked what has not worked and how can you use uh our experiences any uh anything that we've learned. This talk is going to focus on protecting cloud servers, anything like um Amazon servers, Rackspace servers, whatever it is. And the reason why I want to present this information to people is um as an S sur as a DevOps, as somebody who's doing security inside an organization, it's important to conserve your time. There are so many issues that happen every single day that you're chasing after

rabbit holes. you need to understand exactly what's important, what's not that important. Uh the second thing is to maintain our sanity because when things are on fire, the sea level wants everything to be solved in the next 5 minutes. That's not going to happen. Um, we're going to talk about the status quo when it comes to how do you currently protect access to cloud servers, anything like AWS Rackspace as I mentioned, what are the various challenges and threats and ultimately we're going to segue into how you can use user behavior in your organization without any vendor locking and have better security and control over your infrastructure. So let's jump in uh very quickly. I'm I'm going to gloss over a couple of

these slides because everybody here understands what I'm talking about. Uh cloud servers in most organizations are chopped up in various layers. AWS I like infrastructure as a service. Then there's more Heroku Google cloud-like services where expand as you need to and don't worry about the actual server underneath so on and so forth. Then there's a new entrant docker. There's Microsoft Azure etc. These this is these are the things that we're going to talk about. Um it's also important to understand when you're solving a security problem using user behavior who is actually using these servers. Majority of the time in most organizations it's either going to be SR DevOps or IT or developers or security. But there's also other people who are

accessing these servers for different purposes. They might not be as intense or technical as you but they are also using it for their purposes. Whether it might be marketing sometimes you see in companies marketing wants to have its own infrastructure for some reason they want to have their own things running on their uh side. Um yeah u and now what we see in the last couple of years is also the rise of automated software. So most people have heard of or used even Jenkins or some kind of automated testing bamboo build software and such. These pieces of software are also now accessing your cloud server. It's not just a human. It's also automated processes at scale. Not just a couple of

accesses a day, but thousands and maybe millions of accesses a day. How do you make sure it's actually authorized and it's doing the right thing, etc. We'll talk about that um very quickly. The status quo right now in most companies and there are exceptions to this, but in most companies people are using hey LDAP based username passwords, ad carbos based username passwords. That's how you access a server. That's one way that if you have in traditional companies servers inside the network, you're accessing it in that way. Um people think that that's a great way to do things because there is no user account on the server. It gets created when you log in. So that's security

maybe. But the point being that u the traditional methods are still being used. Furthermore, um SSH keys is a very popular way of logging into servers. And one of the primary motivators for using SSH keys in most organizations is the ease of use. People don't want to type passwords. I just want to type my SSH command, get logged in. If I have a script, gets logged in, does what it needs to do, etc. Some mechanisms like that. On top of that, people use IP filters to say, well, this particular group of infrastructure can only be accessed from this bastion host or from this particular group of infrastructure, etc. for computer computer uh communication and there's of course VPNs

and such. So these are kind of the very high level coarse grain buckets of approaches that people take in order to protect access to cloud servers. On top of that there is another layer of abstraction for directory services where you store information about who is in your company who is allowed to access. So it's one more layer above that and some of the popular services that we've all heard about and used are Microsoft Active Directory, Open LDAP and now even workday which is not quite intuitive but people are using workday SASbased solution for even directory services. um except these directory services uh the previous speaker mentioned AM uh identity and access management that is also now becoming a big part of

organizations where products like anyway ping identity octa this that whatever they do single sign on for web apps and they're trying to do AM for all kinds of things there is a lack of AM for infrastructure there is no obvious AM M for infrastructure it's very fragmented the whole system is very fragmented. So let's look at the challenges and threats. Uh some of these we can relate to directly but some of these are actually not that obvious because as companies grow especially in the Bay Area where if you have a startup you want to hire the right person for the right job and it doesn't really matter whether that person is in Slovenia or whether they're next to it matters a

little bit. Of course it does matter but you want to get the best person for the job. So you have distributed teams. In traditional companies, everybody would be told, you need to come into the office, we're going to work from here, this is our place, etc. Now you have geographically distributed teams. Furthermore, um shadow IT has become a problem in the latest um some some of the compromises that we've seen on our uh on in my experience which is shadow IT accounts have been a big enabler of compromises that have happened and the reason why shadow IT accounts sometimes get used in this manner is because the security level of each one of those organizations might not be as stringent

as you you don't even know who is actually accessing your server. All you know is that you gave an SSH key or a username password pair to somebody in some different country and that's being used to access your servers. You don't know if it's them or not. You don't know if they're doing the right thing or not. You have some checks and balances, but that's another one of the challenges. Uh the next one is the high velocity of changes. We've changed the way we develop code. We want to have our own Docker instance. We run the tests on our own Docker instance. Everything is done over there. Then I'll push code over there. It didn't used to be like that.

So there's a very high velocity of what you're actually doing. You're also consuming cloud resources at a much more higher velocity where AWS, Rackspace, every one of these providers has provided you with an API and all you do is you shoot an API, you get a new server, you're good to go. You load it up from an existing image. You don't have to spend time on doing all the configuration stuff. Furthermore, um another thing that we have done we have seen is for employee churn that also contributes to the problem because as people leave the organization, it's not always easy to take away their SSH keys. I bet you that if you talk to a 100 companies, more

than 90% of them will tell you that a previous employees SSH keys are still on the server. It's just the fact of life. People want to do SSH key rotations. How many actually do it? very few have. Good for you. Um also the attack surface is changing horizontal movement, vertical movement. It's not the traditional types of attacks that are coming in. People are going after privileged accounts. They no longer want to get the sales guys account because that's too hard to get vertical privilege escalation. Let's get the account for an administrator and see what we can do from there. So the attacks are also changing. Uh I'm going to skip over this slide. We all

understand what horizontal scaling, vertical privilege escalation is. Um let's talk about going forward. How do we protect ourselves and our infrastructure better? The first thing that I want to talk about is the concept of least privilege. Right now um if you get a cloud server and you get an account. So let's say I go to Rackspace. I I have access to the Rackspace console. I start a cloud server over there and I can connect by a terminal. I can basically do anything that I want on that server. Now, that might be okay for specific situations, but do you want to make that be the default policy that you should be able to do everything on a

cloud server or not? Shouldn't you have to earn the right to run certain commands, certain scripts or do things versus being given the right from the get- go to run things? That's something to think about. Uh, second thing is that any action that people take has to be risk scored. It cannot be without consequences. If you run IM minus RF star on a production cluster, you should be fired. So everything should have some kind of consequences with it. You need to understand how how uh what is the risk uh riskiness for the lack of a better term. Um English is not my native language, but um it's not ours either. It's okay. Um so in order to do that what you need

to do is build a pipeline and we'll show you how we have done it and how you can do it very easily. You have to have a pipeline that actually analyzes all the commands that are being shipped to your actual cloud server from your terminal on your laptop. And ultimately the model that we are going to build is based on these four fundamentals of learning, matching, acting and updating. and we'll talk about how do you learn, how do you match, what do you match, how do you act and what do you update. So all this will be coming. So let's jump right into it that when we talk about uh cloud servers uh what is it that you should be looking

out for if you want to do some kind of profiling ultimately you're trying to build a model to say John uses server X in this way. What does that statement actually mean? John uses server X in this way. You have to profile what John does. One of the things that uh you can do is analyze the commands. What does John run? What level of access does John have? Is he in the administrator group or is in some other group? Where does John login from? Etc. So what commands for example do you you've never typed uh anything related to etc shadow ever? Why are you typing something related to etc shadow null? Those are things that you

need to analyze and you can flag them as anomalies and such. Next thing is that we also look at connection statistics to say geolocation based uh are you coming from a VPN endpoint or not? How many connections have you opened up to parallel server clusters etc. And these are all signals and features that you can use in order to say whether a particular person or access to a server is valid or not. Is it being used in the right way or not? Um, another thing that you can do is when it comes to taking action, obviously any machine learning or AI, I prefer to call it machine learning, AI is kind of like out there. Uh, any

machine learning based system, you will have false positives. There is no system that is 100% accurate. If anybody tells you your my false positive rate is zero, don't talk to them. It's it's not possible um for real world examples. So what you can do is when you have false positives you actually have a graceful way of doing realtime adaptive two-factor authentication and we'll show you how we have done it on our side. Uh that's another benefit of that and ultimately we'll show you the tools like Apache Spark, Piketsai, whatever we have used to build things ourselves. So you can go home and you can get started and we'll even give you tips about what exactly to look for in these tools so

you don't waste time figuring out a bunch of things that have happened already. Um one of the ways to make a case to management to make time and space for an effort like this is compliance. They all love the word compliance. Anything there's a compliance. Oh yeah, yeah, yeah. We need to buy that tool. We need to do that. Oh yeah, compliance. They might not understand what actually is going under the hood, but they love this coin. So there are legal consequences of falling out of compliance, whether it's PCI, socks, there are different types of compliance requirements, etc. What we are proposing over here also helps you with compliance because for different types of um uh compliance requirements

let's say some one compliance requirement says if an admin account fails to log in three times on a server you need to generate an alert things like what we are talking about helps you actually do something like that you can it's a tunable system that we have um so now let's we've talked about who is using IT resources what are the IT resources Why is it good? All that stuff. Let's talk about behavior. What do we mean by behavior? So behavior is basically a marker of your identity. That's the way we think about it. So when we talk about behavior, we are talking about what are the types of commands are you using. The first easy thing that you can do is most

Linux systems will have some kind of a history file in there. You can grab the history file and you can see what commands have been run etc. And you can form a Beijing model very easily frequency analysis is very easily to say out of the 100 unique commands that I see these are the top 20 these are the middle 20 etc etc etc. So this is a very simple lowhanging fruit that you can apply it to. Any command that doesn't fall within the top 50% bucket, mark it as I don't know. I need to verify do you really want to run this command? And you can have a Twilio API basically send an SMS to the guy whoever's phone it is

who's trying to run it and reply back etc. You can build your own system nice little system like this. But that's one of the things that you can look at. what are the types of commands and directly going to the history and grabbing stuff from there is an easy way to get started and bootstrap the system. Um furthermore it's not just the commands you run what is the style in which you actually run the commands. What do we mean by style? So a lot of people often times uh they will chain commands using semicolons or they they have a specific propensity to use a couple of commands with pipes and so on and so forth. Every person is a

little bit different. So we are trying to understand using frequency analysis how many times does the semicolon come up on usual in every command that you're typing. Are you piping things? What are you doing? Things like that. So to give you an example of uh what we talked about here is a very quick screenshot. Um and I apologize I don't have a live demo right now but uh it'll be available on our on my website by the end of this week. But uh so this is basically a very quick example of a screenshot that just shows that I'm just typing a couple of commands over here. What am I typing? I'm just using a well-known piece of

software called cluster SSH to log into two servers. I'm specifying the uh where the server is and I'm specifying the port. So when I type this particular command, I I tend to chain things using semicolons and I just want it to be on one line. There's no good reason for it. I just want stuff on one line. I like oneliners. I write o and it's oneliners. Everything is one line. So, um that's kind of something that you can focus on and say that well this person when they type commands they usually write stuff in one lines. Why are they using something else? Those are also markers of your particular behavior. Furthermore, a lot of people we've done

this research where you can easily tell whether it's the right person or not by looking at their mistakes that they make when they type commands because it's it's it's the way you type actually is pretty unique. You will make the same mistakes. Moreover, when you type commands, if you type list as li for some reason, you will make that mistake. You'll use the backspace, you'll come back, you'll correct it, whatever. you will do your thing over there. Furthermore, what we have also found is time of day is not a very good indicator of who is accessing when are they accessing is it good or not but it is a good indicator of a very coarse grained

marker. So for example, you cannot use time of day to say is it ironbound logging into the AWS server or not. But you what you can use AWS uh the time of day to say is that is there an attack going on on that AWS server because usually during this time to this time there's between 20 to 50 connections on that server and now I see 150. So there's a difference between how you use each feature at what granularity do you want to bring it to the person? Do you want to bring it to a much more coarser granularity? Um, another thing that we use is type of resource. So if you're going to assign a

risk score to something, you have to also understand how important is the resource. Is it a docker container that I'm going to not need after 30 minutes or is it a production server? Which one is it? So the risk score also has to be predicated on not just your user behavior, the commands, the style and such, but also what type of resource that you are using. Um as we talked about if you look at the type of commands that people type uh you can have very interesting buckets of the type of commands. Some of the commands are related to network somebody's uh checking how many packets are coming in is TCP working fine so on and so forth.

You can even create profiles to say how does a what are the various types of commands what are the buckets of commands that a person runs. You will find that with S sur DevOps, the percentage of commands they run for network stuff and system stats is much higher than anything else. For developers, they're committing code to GitHub. They're uploading, downloading, whatever they're doing. It's slightly different from that. That's another marker to say, are you in the right group or not? I know it's 100. I know he's coming from an IP. I know these are the type of commands he used. Let's try to also analyze the group of commands that he ran. Is he falling in the

network bucket? Is he working in the stats bucket? what what is it doing and as I mentioned we try to do per server profile per person profile and that's important because the way you access again your docker container is not the way that you access your production server there's a difference over it so you need to also have a different profile for those um when we talk about creating feature sets there is a um the the some of the things that we talk about is basically to say that when you look at these commands what what is important you can try to do frequency analysis that's one easy way to get started with these uh feature sets and

you have to feed these feature sets to a classifier something like le trees or ad stumps anything of that sort one easy way to get started with this is a tool called weta or psychit pi is we k a and scikit pi scikitlearn or scikit pi whichever one uh both of these are really easy. WA has a graphical user interface. All you do is just feed it a file. It's basically a file with a couple of CSV entries in it. It will load it up. You can do different types of tests on it, etc. Scikitpi is a little more powerful. It's in Python. It's more um intuitive according to me. But uh you can try any one of these and

you'll find that it's easy for you to identify a few features that make meaningful sense to your classification problem. Uh ultimately what you want to do with all this feature classification is to obtain a score to say because you don't know for sure if this person is actually on or not based on their behavior. What we are getting is a score confidence score to say 0.85 8 5 0.95 0.9 what what is it? How much confidence do I have in saying is this the guy or not? You have to then set thresholds to say you know what if it's below 0.75 just don't let the person access AWS I don't know it's too much of a and this

is this is one of the problems with this this approach where there is no bullet there is no onesizefits all there is no magic bullet that is good for you and is good for me the the amount of false positives that I can tolerate in my company is different from the amount of false positives that you can tolerate in your company. So there has to be a way to change those markers and say is it 0.75 is it 0.8 rate is a 0.9 how accurate do I want to be how pedantic do I want to be etc um when you when we talk about classification uh there are basically two basic uh groups one is supervised learning and

one is unsupervised learning for supervised learning there are various classification techniques like uh base SVM if if you get your hands on WA or scikitpi I would recommend that go with SVM SVM is a simple meth have heard it's easy to understand and it helps you that if you have n dimensional data it's easy to create a plane through that data to separate the data points into good and bad good and bad so instead I mean I I would encourage you to try out different models different algorithms everything but if you're looking for quick bang for your buck go for B or SVM it'll give you good results very quickly and I and there are specific reasons as to why um

for unsupervised you can use something like K means or expectation maximization etc but My favorite is SVM. It works really nice for us. Um, one thing to keep in mind is as we look at all these features and we say, "Oh, maybe this typing style makes sense, maybe the number of semicolons makes sense, maybe this makes sense, maybe that makes sense." What you're actually doing is creating all these feature sets to say, "Is this important, is this important, is this important, etc." If you use WA, it has a very nice graphical easy way to also show you something called PCA, which is principal component analysis. And it's a fancy word, but all it all it

basically does is it shows you out of all these features, which one's really important. Which one should you be focusing on? Should you be focusing on everything, or should you be focusing on these five or these 10 instead of these 50 that you think are important? because ultimately what you want is to have a very targeted set of features that you're looking at to get good prediction accuracy. Um, another thing that to keep in mind when you do this is more data is not always good primarily because there's bias in the data. Sometimes the data is dirty. There's different types of issues with it. So just I mean what you should try to do is

get a chunk of data, develop your system on top of that data. If you see you're going in the right direction, then you can add more data to the classification issue itself. But don't just add more data like, oh, I have 10 GB now, so 20GB will obviously be better now. 30GB will obviously be better. It doesn't work that nicely. I I wish it did, but it doesn't. There's bias, there's noise, there's different types of um issues. Um, one of the common things that we sometimes get sucked into is more features is always good. Obviously, if I have to say, is this a phone? Well, it's white, so it's an iPhone and it's like this and it has glass in the front and

it's got a battery. Obviously, it's a phone. So, if my the normal way of thinking is if I add more features to this uh classification, it's obviously going to be better. That's not always the case. primarily because if you think of a three-dimensional space and there are points in that space, the more features you're adding the the envelope that can separate whether this point is inside a bubble or this point is outside the bubble. As you add more features, this bubble is no longer a sphere. It turns into an ice crystal where each data point is at the end of an ice crystal. The problem that happens with doing something like that is in the real

world when you will throw points data points for classification at this ice crystal it'll tend to fall outside and not really give you any kind of useful measure of accuracy or prediction. So always having more features doesn't mean that you'll have better prediction. So how do you actually make the data actionable? Let's say you found a way to qualify and quantify that this is an he's accessing this server it see everything seems good everything now if everything doesn't seem good what then how do you make it actionable one of the easiest ways is that you can kill the session you can change out the SSH keys immediately not let that person have access again if you don't want to go to

that extent what you can do is build your own custom 2FA implementation where you can just send an SMS to somebody or use an existing API that another service offers and just ask the person, hey, use the fingerprint sensor on the phone and tell me you're trying to run RM minus RF star. If they do, sure, let the command go through. And this is a graceful way of handling false positives because otherwise you would need to have an IT administrator or a dedicated sec security person looking at all the commands that get flagged and making a decision on well does he really want to run this thing? Why guess when you can have the person say, "Yes, I really want

to run this thing." There's a slight difference there, but it it it works really nicely. Um, another thing is that you can if you if you go down this route, you can have some really nice layered authentication and authorization going on. So say for example um if I'm trying to access a server and I'm trying to do something funny to that server, you can send me a two-factor notification to say oh an are you trying to run this particular command on this server. At the same time you can send a notification to the IT owner or the business owner of that specific server cluster that an is trying to run minus R star. Should we let him run it or not?

So you can you can have twoman protocols, three man protocols, whatever you like. It's all based on this entire mechanism. Uh you can also do dynamic ACL modification. Ultimate goal being if you are trying to do something bad, we should restrict your privileges in real time. If you're fine, do whatever you need to do. And this also forces the attackers to learn how you use cloud servers. If I steal your SSH key right now, I can go in and I can do my things in most cases. But it forces me as an attacker to also learn what are you actually doing. Um, one of the things that we all need to understand is uh systems need to keep

learning. So there has to be a constant stream of data. You cannot just grab data from history files or from a sim log and train your system on it and then let it loose and then you cannot expect that people will not complain about it primarily because the way people are using the system you you need to refresh it with the way people are using different types of uh servers. We can think about rule-based approaches but we shouldn't obsess about it primarily because rule-based approaches it some some things are very common sense they'll work nicely but when you start to scale how many rules will you write it's not possible you you you cannot pred it's like asking somebody to look

into a crystal ball and say well what what's going to happen tomorrow I don't know so you can't really keep writing rules uh this also helps with auditing shadow ID accounts as we mentioned so that's another benefit benefit um very quickly. So let's go to the actual tools that we can use to build out this pipeline. Some of the freely available and actually very easy to use tools that you can get started within like 15 minutes. That's enough to get started is actually scikit by or these are very easy to use tools. You can download them all and there are even tutorials to give you data sets to tell you that oh you would you like to try

out expectation maximization would you like to try out K means here's the data for it so that's very very helpful the second thing is a lot of people have probably used it or heard of it is Kafka Apache Kafka which is basically a way to gather a lot of data from endpoints and pump it in a reliable Q way to your system we have used Kafka and what we have found is it's a good way to shift bits or information from a lot of endpoints to our analysis system. Uh on top of that there is another piece of software called Apache Spark. Uh again people might have heard of it used it. Apache Spark is a good way of doing

distributed number crunching whether for statistical purposes whether for heristics whatever you need. So if you are trying to find say running means of something because that's what your algorithm is based on you can do that very fast very reliably with very low effort. So these are again very useful. I mentioned Twilio just an example. You don't have to use Twilio for sending SMSS or anything. You can use whatever you like. It's just an example. We have used NodeJS internally for a couple of our services. You don't have to use it. You can use whatever you like. But this is basically the stack that we have. And the first three ones are the ones that help you develop the plumbing for this

entire system. Again, as I mentioned, when you start off with WEA or when you start up with Psych Pi, look at SVMs, LED trees, AD stumps, things like this. They will save you time. You don't have to go and experiment with every single thing under the sun and then find out, well, this doesn't really work well for this data set. So, what does the plumbing ultimately look like? Uh one thing that we have not mentioned till now is in order for you to actually make this actionable, you need to have some kind of a SSH proxy otherwise there is no real way for you to take action on it. So let's say if you if you are going to type rmus r

whatever the command is I have to have a way to stop that command from hitting the server. If if I don't use an SSH proxy, it's very hard for me to do that. I have to go to the server, install PAM modules, etc. I can do it on the server side if I wanted to. But the downside of that is I am modifying my server images, which I don't want to do it. That's just our use case. So, we use an SSH proxy. And the way we use it is if you assign a server to us and say that you know what um protect this particular server it assigns you can use dynamic DNS to

basically tell the user that hey yesterday you type SSH@server1.com today you'll type SSH_server1.com atyou company.com whatever that might be. So that's one easy way to also put it into scripts and let them just go crazy through the proxy and do their thing. So because everything is now going to the proxy of course the question being is it a single point of failure? Uh yes. So you need to have multiple proxies. The the great thing about that is you can always argue for more time or uh budget for doing that by saying look we need to have separation of duties and separation of traffic. So dev traffic goes to this proxy, test traffic goes to this proxy, production

traffic goes to this proxy, customer traffic goes to this proxy etc. you can neatly lay that out. What we can do is the way the the potential workflow that can happen is when you attach a particular server to this system, it goes in and automatically changes out all the SSH keys for the users that are on that server. Once it changes out the SSH SSH keys, it basically what ultimately what will happen is you will type the new SSH command. You'll connect with the proxy using your existing key on your laptop. there's a completely new key pair that opens up a tunnel on the other side joins it in and because we're joining it in you can do s introspection

to say command introspection to say what are you typing we are not going to let this command fly etc etc that's one approach that you can take uh and of course as I mentioned you can use SMS email whatever you want to tell anybody that uh things are happening but the interesting thing is that you don't have you don't have to just stop here once you have your system why stop at cloud servers. You can also profile cloud apps. The way you behave on PayPal is different from the way you behave on Facebook. How many clicks do you do on Facebook in 10 seconds? How many clicks do you do on PayPal in 10 seconds? How

much do you scroll? What is the curvature of the mouse when you move things on Facebook and uh PayPal? These are all markers that you can again pull out. And the way to implement a very simple solution, loweffort browser extensions. Browser extensions are really powerful. You can gather a lot of information from these to say like what's actually going on etc. Um type there there has been a lot of talk about using typing patterns in the past. Uh just wanted to clarify that we've done some experiments and our experience has been typing speed is a very poor indicator and is fraught with false positives. Um for some reason we've never been a we never been able to

use typing speed as a good indicator of if it's you or not. Um primarily because people will make mistakes and they'll typing speed will change. It's it's always a little bit of like pain in the behind. Um but uh just because typing speed is not a is not a good indicator doesn't mean typing patterns aren't a good indicator. A good indicator is something like you going on a site. Think about this. You go on a site you see a registration page which says first name last name D. You have a way of going through all these fields by using the tab or by using the mouse and people do it in different manners. So that's

one indicator which is actually pretty good to say that when you visit a web app, you actually use tab to navigate through all the form entries. The good thing about going with this approach is that you will not have any vendor locking. There is no software that you will have to buy. You can build your own thing. You can customize it as you want for whatever you like. Um you will have the option of deciding how you want to do false positive mitigation. You will not be held back by a vendor saying this is the way we do it and that's how it's going to happen. Your data is in your control and ultimately you will have adaptive

two-factor authentication with twoman protocols, threeman protocols, whatever you like. The point being if your infrastructure is your infrastructure in your company is divided up in various buckets. It's really important, it's important, it's meh, who cares? That's kind of the spectrum in most companies. If it's really important, you have the choice of saying how much adaptive two-factor authentication should I put it in front of there because you're doing it in a realtime way. And the good news is this is actually low friction. People when we have used this with other people, they found that push back from internal employees is much lower if you do it like this versus other uh challenges that we faced in the past. Um

again as I mentioned we all need budget and time for doing something like this. here are some pointers as to how to make the case to seale to say look I really want to do this project it's going to help us this is what we're going to get out of this project so again throw in compliance you can say hey it's going to protect customer data uh it's going to be harder for somebody if they ever break in to actually get access to customer data better control and ultimately time savings for it will be more efficient etc so there are some points to be made in order to say that um is this uh is this a good investment

of money and now the ultimate point don't end up on Techrunch or Forbes or whatever. Uh I'm not picking on Snapchat, but I'm just saying that things happen and uh we nobody likes bad publicity. Um but this was a very quick presentation and we didn't discuss the mathematical aspects of a lot of things. But the good news is that if you try those pieces of software, if you look at those LED trees and SVMs that I mentioned, it's actually very easy to get started. And the good news is that we all when we work with techn in technology companies we have access to this gold mine of data. Just think about the history data that you have on your

all your server. That's it. Next step is going to a sim and pulling out information from Splunk or Sumo Logic if your company uses anything of that sort. So the possibilities are actually endless what you can do. And the good news is you control the security envelope, not a vendor. You control what is right for your organization, not somebody else. So, um I'll stop here, but if there are any questions, I'll be happy to answer. Uh we built our own C uh sorry yes. Uh so the question is what do you use as an SSH proxy? Um we used our own solution but we are going to make it available for everybody from our GitHub page. So

you don't have to develop one yourself. But real realistically you can use nearly any SSH proxy open source proxy that's out there. And for a couple of uh ones that we have looked at one is called SSH Piper. It's actually very easy. It's like four lines of code that you need to put in to basically do command introspection and things like that. How do you address uh when a particular command you run what it runs on the server allows other commands? So like from CI want to run the commands from like I type a pi vi is the command first of all it's interactive. Yes. But also then I can potentially from vi write to another file

right how how do you incorporate that kind of malicious action? That is a good question. We don't have a good handle on that. Uh the question was that if you have an interactive vi based session and you're trying to uh run commands from vi how do you stop that? We don't have a good handle on that. Yes. One point. So if your solution is not logging mentioned you have a proprietary. So if you're not logging it yet [Music]

yes it does. Yes it does. Uh the the question was uh well the point is that if you are not already looking at the string agent for SSH with the verbosity flags you should look at it because that is a good piece of information. That's a good marker. So definitely something to keep in mind and does our own solution look at minus r sorry dash r tunnels? Yes it does. Yes. There's some there's room for machine learning and this type of thing but a lot of issues about writing tools um you mention a lot of tools but I really heard which doesn't really scale for many reasons there's a lot of reason I think you're abouting

and starting out threats that are specific to their environment and then identify data that's threats and then going after that data rather than saying what infrastructure are we employing to put this magic data in to make it's a little

sure I appreciate that yeah Sure.

Got any more questions? What in the back?

Right. So what we've seen is um in script sorry in certain certain types of scripts that we have analyzed uh with people that we've worked with they they are running uh when when we look at our proxy we see certain commands that are coming in using in a chain fashion. So RM being changed with something cat etc etc etc. That's one indicator for us to say when somebody's running a particular script that we know about it's usually running with these three piping commands etc. But if that suddenly changes that's an indicator for us to say that well it it's falling outside the classification envelope we'll have to do something to make sure uh send a 2FA notice to the

actual IT owner and say that this script is trying to run this particular command. Should we let it go through or not? Yes, it it has to because I mean it's not it's not as real time but we take chunks of 5 minute intervals and we try to update it because there's it it is realistic that people will change a little bit that they are accessing the server. So you have to get that new information. Okay, I would thank you. Thank you. Did I get your name right? Yes, I've been practicing for an hour. Thank you.

Using Behavior to Protect Cloud Servers

Related talks