← All talks

Security Tradeoffs In Elasticsearch - Philipp Krenn

BSides Luxembourg · 201932:0296 viewsPublished 2019-11Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
DifficultyIntermediary
StyleTalk
About this talk
Elasticsearch achieved rapid adoption by prioritizing ease of use over security, making tradeoffs that later proved problematic. Krenn examines specific design decisions—binding to all interfaces, automatic clustering, running as root, storing credentials in plaintext, and omitting TLS by default—that simplified initial setup but left systems vulnerable to ransomware and data theft. The talk traces how the project evolved to address these issues, including introducing production-vs-development mode checks, certificate utilities, and eventually open-sourcing security features, while discussing the tension between usability, commercial incentives, and security-first defaults.
Show original YouTube description
The NoSQL ecosystem thrived on combining scalability and simplicity. This talk focuses on some assumptions we built Elasticsearch on, which helped the ease of use initially, but turned out to be less than perfect for security in the long run: Binding to all interfaces and broadcasting join requests to the whole subnet makes clustering simple. Running as root is the straightforward option. Using a general-purpose programming language for scripting adds lots of features. Guessing the content-type of a request is fine. Default passwords and clear-text password files are a reasonable tradeoff. Docker and distributed systems play well with your security efforts. Generating TLS certificates is easy. Everyone will turn on security, and defaults are easy.
Show transcript [en]

you morning how's everybody doing fine more or less more or less awake okay yeah it's still early let's see where we can take this so let's talk a bit about trade-offs in security which are always an interesting topic I have like like we already heard I work for elastic the company behind elastic search and the elastic stack and you might have seen something like this this is approximately what we do but I will mostly focus on elastic search which is the oldest product and also wear probe the security mats the most because that is where your data lives and so one thing I I once saw I found very interesting is this one which is of

course a joke because yeah just not having SQL is not making you very secure from any injections you can still have other kinds of injections but the more general question then is did no sequel improve security overall does anybody have an opinion on that or does anybody have any guesses so I would generally tend on saying kind of no because it's always like you have a new system and it starts and it evolves and then security is probably not the main concern yet so kind of the pieces that I have here is that when you have an initial product come on in everybody when you have an initial product like you had the no sequel data stores then

what you really want is you want to have the ease of use so that everybody starts using you so you will make some trade-offs that probably are not the most secure ones but you just want to have the growth and be very easy to use I think that was one of the main reasons why the no sequel explosion like ten years ago or so happened there it was just much easier to get started and solve some problems but it definitely came at the trade-off of security so let's take a look at some of the examples of things that have been kind of like weird so does anybody know what elasticsearch was doing initially in terms of binding to interfaces and what

it is doing now yes exactly so it used to bind to everything because that was the easiest way to get started you didn't have to configure anything it would just listen interfaces we changed edit version two which was a long time ago already because well a lot of people started running that on public instances and it wasn't listening to everything and there was no security but default which was not a great way to get started or it was a great way to get started but it was not a great way for security and we changed that over time but that finding is something you need to change now by hand but initially it wasn't there

because it was much easier to get started like that does anybody know how how clustering worked initially there was another interesting one so basically what clustering was doing initially was it would just scan the local subnet and any other instance with the same server name that it found it would automatically form a cluster which was again great for demos because you would just start on you would have three laptops in the room and then we would start them up with the default settings and they would automatically form a cluster which was great for demos and everything but probably not so great for security and we even had people report that they would VPN to their production

system and had elasticsearch running locally and then it would form one big cluster and then they thought they were running something against localhost and then they would drop the data in production so again for security is probably not the best thing that you have this automatic clustering we got rid of data as well over time but yeah there might be cases where people did some weird stuff there then over time we added something we call the production versus development mode which generally the assumption is as soon as you bind to an on localhost port so you could form a cluster we assume that you're in production mode and production mode will basically add some additional checks and

constraints whereas development mode will just give you some warnings and tell you well maybe something is happening there so something that we do here so this is the actual Java code and elasticsearch basically checks do you bind to any interface that is nan a loopback interface or we have a single node mode and single node mode is just for docker why do I need it for docker yeah because otherwise I need to get out of this stupid container or not stupid container but I need to get out of the container and then it would always look like production mode in docker but most people just want to try something out in docker so we kind of like built in a

little loophole what that thing cannot do is it cannot form an actual cluster so this thing can be binding to a non-local host interface but it cannot form a cluster with other instances because our assumption is always for highly available systems you need more than one instance so as soon as you form a cluster then you will be in production mode but in general we assume as soon as you could form a cluster you bind to non-local host then it would actually be production mode when production mode has a lot of checks most of them are just for running it in a performant and secure way but some of these checks will actually run into security so for

example does anybody know what will happen if I try to run elastic searches root nowadays at least is that possible or is it a good idea no ok we all agree on the know here and can you do it if you really want to no you cannot so what we basically has half is it's definitely running is Ruth is the method call which I find great on Windows we don't know and we don't care because there is no easy way to find out I think but on on on any unix-based operating system there is the Java native access and with that you can basically check for some permissions and we use that to figure that out and what we then do is when the

process starts up in the bootstrap checks as well is it throws around some exception and it will just exit and it's over like if you try to run elastic searches root it will exit the process and I guess everybody hears as well that that is a good idea nobody would disagree with that and this is the right thing to do right unfortunately it's not always that easy because then we had this one here this is always good fun for security and then we we started like I always say these are like the cockroaches like things that you get over it rid off for some time but they keep reappearing like the cockroaches in the kitchen because then suddenly in our

elasticsearch docker repository people say like this should be running his route like why do you make it complicated we care and then you suddenly have these problems reappear where people want to do this and then you have to argue why you don't want to run or allow to run the process to run this route by the way the only way to make elasticsearch on this route would be to work the code and comment out that runtime exception there is no other way to force that in any way or anything yeah I also call this kind of the talker often has this Yolo approach like we'll do something just to get it running quickly and that is what

you see every now and then anyway is second does anybody know or is using SATCOM basically it's you can filter which system calls are allowed from your process so if you know that your process never needs to fork out another process you can just on Linux disallow that and this is something that we set for example in elasticsearch this might be very tiny but you can see for example here this is the call for fork and this is the call for execute we just block those calls so even if you take over the elasticsearch process you could never run these system calls because we know we never use them so we just forbid them you could even disallow network access

for a process which doesn't make much sense for elasticsearch but if you have a process where you know it will never need to reach out over the network you could disallow that and that's seccomp and how to actually do that and we just call it the system call filter basically we set up the profiles what what should be allowed in the beginning import one scripting scripting is always good fun right and everybody wants scripting so what we did was initially we added general-purpose scripting like since Java is an elastic search is a Java process we added groovy because scripting and in the JVM and it was all running can anybody guess what happened yeah people found ways to break out of

the sandbox and lots of bad stuff happened I think more than half of the security issues we had in the first couple of years for elasticsearch we're because of scripting and somebody found a clever way to break out of the sandbox because we thought well you can only call this thing and they de found another way to call other stuff and break out of that and it was kind of like whack-a-mole and it's just not a good idea because the scripting language was not made for the purpose we tried to use it for example a friend of mine here in elasticsearch instance running somewhere and then he started sending so much spam that he couldn't

even SSH into it in anymore I think because somebody had broken through the scripting so what we then did is we invented our own scripting language which I'm not sure if that is the perfect solution that you always want to have but that is what we did and the general goal was security and performance the nice thing is by writing your own scripting language you can basically define like this is the thing these are the things that are possible and this other stuff is just not possible like there is no way to call another class because we just don't have the concept for that which was one of the ways how to break out of the

security samples purposes also another way and then we got rid of all the other scripting approaches which people didn't like because they had to rewrite the scripts but for long term security this hopefully was the better approach one thing elasticsearch didn't do for a long time was that you didn't have to specify the content type so for example with this one here I can't just run Carl and say here is the body but I don't need to specify that this is actually Jason this will just accept something any guesses what what went wrong afterwards what people started doing or what people try to do so there is this problem that plain text is treated as safe and since

we abide by your browser and since we never checked for the content type you could just send Jason as plain text elasticsearch would import interpret that and we just run your command which again wasn't very smart because you could just do something like this so for example with jQuery if you call this and you just post a plain text string do that it would just create the document for you which is not great we change that but now everybody has to add content type Jason to every single request they run and people complain a lot about that because you have to type so much more and this is again one of these trade-offs where you can see it's

coming back to you again and again like you want to start simple and then you figure out these are all the loopholes you will run into over time so you need to fix them from does anybody remember default credentials in elasticsearch was anybody using them in some pen testing well maybe because for a long time there were it was called elastic changed me guess what nobody changed change me unsurprisingly it was a pretty safe bet that those would be there we got rid of them the problem is then only how do you bootstrap credentials especially if it's a distributed system and especially if you have stuff like docker because then people would say okay during the

installation process you just enter the password but we thought there is no installation process anymore it's just an immutable container that you should start up where do you put your passwords so the way we approach it now is if you run it through docker we will require you to have an environment variable and through that you can pass in the initial password and then with that bootstrap password you can replace that with proper credentials later on but that bootstrapping thing and dating that start is actually pretty tricky because a lot of people assume well during the installation process you do something but with docker or especially kubernetes then you don't have this initial installation process anymore so that's

making it trickier as well we also stored credentials you could just add them in the clear text in the elasticsearch GML configuration file yes you needed to be rude to read that but we're still not great so now we have key stores but by now or right now they're only called obfuscated and we will add proper password protection later on to those so this is still a bit working progress without making it too hard for the users either who likes generating TLS certificates yeah it's not a trick question you don't have to show me yeah and you're probably the crowd that knows enough like this much better than the average person trying to set this up so

we try to by now wrap this in our own certificate utility that will run the right commands in the background depending on your operating system and you have just have to pass in some variables and it will try to generate it the right way for you but it's still a major pain point and most people don't want to set up pls because nobody wants to deal with that and I don't think we will be getting rid of that anybody knows how authentication is working in elasticsearch by default so it was one of the features how we made a lot of money which kind of like made it complicated because it was one of the things where we said like well if you

want to run it yourself and it's easy to run behind a reverse proxy and you probably shouldn't expose it to anything outside of your own network so that is normally kind of good enough protection especially since it's just plain HTTP so you have reverse proxies and everybody knows how to work with those so you can just terminate on the reverse proxy you have basic auth you have TLS there and you're pretty much good the problem is as soon as you try to do stuff in kubernetes this concepts pray there is no easy reverse proxy thing and you need TLS in your cluster so with kubernetes had changed that why we made security free also because before it was driving

a good amount of our revenue and enabled us to bootstrap the other processes but by now we have kind of like other areas where we can make money as well so this is allowing us to do stuff like that the problem then is should it be on by default and we may only make that change in because we only made a kubernetes operator when we released 7.1 changing that will be a breaking change because if you change it in the default version then it will be different so we might only change it in the next major version if you were paying or if you were paying customer using the production setup like the bootstrap checks for production then

we would require security as well do we need TLS for security yeah are we enforced it without TLS you cannot enable security and that is another thing we might enable security by default but we will still require TLS and that's kind of the thing that people are probably okay with with credentials but setting up the certificates is a major pain point and we are still not sure really how to do that I've seen that others are is shipping default certificates with their stuff but there's also a crap idea like you're not winning anything but we probably need to do something because has anybody ever looked on Sudan for port $9,200 response code like that is

the default elasticsearch instance and ransomware is still a thing and you could for example get one of these clusters then you do a cat innocence is basically you show me all the tables and then you see something that is called please read and then you know well we're nice so we will read what people suggest that we read and then if you read that and that might be a bit small but you can see here your database is backed up send us 0.5 Bitcoin to this address to actually get your data back and then there's always the hope that they actually made that backup and you get your data back because sometimes they just say they did because well you need

to download the stuff you need to store it somewhere and they just deleted it instead so there is always two hope that somebody really did that and you could get your data back the problem that happened then was that Bitcoin was pretty volatile and I think this is depending on where you were in the cycle like Bitcoin was crazily expensive and changing from a day to day basis so then they said like send us the equivalent of I don't know three thousand dollars in whatever that is in Bitcoin today because it was kind of like going very quickly and then the other problem was that if you had an insecure instance you could run into the matroyshka problem

like somebody would take your data and leave the message then somebody else would take that message and leave his own message and then somebody else could come and leave their own message and then you basically you would need to pay all of them to get back to the original person with your data with the real data maybe so you don't really want to get into that situation and it even hit us ourselves so for example we have we were running some test instances and here you can see the Germans sir send us an email and said like well you have an open elasticsearch instance running there because they started actively scanning german host us if you have open

elasticsearch instances because so many people were ransomed and well we got them out those ourselves does retest instances so not a big deal but yeah something you should probably not do so starting off with if you have a product and you want to get a wide user base you probably need to make it easy to use and to have like a growing user base quickly the problem is as soon as you break into like more serious workloads and production you probably want to have something that is secure and not just quick and easy to use but it's kind of a fine point because a it's a painful migration and also when do you kind of

to each step and how how far do you lean into the insecure region to make it easy initially that's kind of the idea here so I think that's the discussion part and we have spent exactly 20 minutes on talk and now I'm expecting 20 minutes of questions or discussion or I don't know and I hope everybody is awake now and I guess I'm passing on the microphone that the questions also require are you having a one I have another one Fulton thank you Phillip hostile on so has anybody had any security issues with elasticsearch shut up just out of curiosity or with any other no sequel datastores I take those as well yes I had a question regarding the

privileges for indexes okay because there are quite a few privileges that seems to be kind of duplicate for when you start when you index a document you can create rights index update and all of those basically allow you to do an update and I wanted to know if there was a specific reason why you don't have a privilege that would allow you to index the documents but not updated afterwards because the issue is that when you want to use beats because they are lightweight and you don't want to go through logstash then the issue is that if you have a compromised host even if you use client certificate or whatever if you have that compromised host you

have the issue that because the IDs typically not that hard to guess then one single host can basically rewrite your wall index or try to now you're in luck we're just working on that okay I cannot promise you the exact minor version but I think it's coming out pretty soon I'm not if it's sure so the next one will be 75 I'm not sure if it will be that or 76 or something like that but it required a bit of refactoring internally but this is on the way and because this was this was a known pain point basically and there was no easy way to actually do that I think the one thing that you could have done

is that you disabled reads because then you would need to basically know the ID to actually replace the document that's true but I don't think that the IDs are so random so okay you mean you had to guess and basically try to brute force the IDS out okay yeah I mean that's it's not a proper security but it's kind of like making it harder to replace specific documents at least that you say you cannot read the data back and then you without the ID you cannot replace it but a proper fix is on the way I I don't want to promise a specific version also stuff can always slip but I know that the security team is working on that one

right now okay because it has come up in the past thank you yeah thank you thank you for the nice presentation I'm a big fan of elastic I think I deploy an instance every week or so for the reasons you mentioned if every if anyone has to keep the issues on basically before the recent changes that made security free I think every user had security issues with elastic because you had to go out of your way to set up even the most basic types of security for your instance and because of that of course you have the chaton and all the open instances an incident so my question is and I hope this is not too

political don't you think it's elastic fault in a way that they actually locked the most basic security function after behind the bay in tier and then left the product itself free to use so people can get actually the functionality which is always the most important for example for the business units but the security teams are locked out or for example don't enough how don't have enough time to actually run after all the instances the set up all the reverse proxies and settings so I don't want to blame the users because Security's a hard problem but for example MongoDB had security for free for I think four plus years and they're being hacked left and right as well so I

think that's kind of like that the nice counter example that Mama DB had secured I think at least four years security has been available for free Mori to be and still they get ransom pretty much as much as us at least that's my cart argument here I mean you can you can say that but that's more of an escape I will say from one server because you can always speak because you know no product is secure enough because but you mentioned that you are releasing the security for free because you want to improve the security so of the customers and the other so why not do it from the beginning why wait until like you are in

version 7 when want to do a move like this up well if you run like a regular cluster like the first thing is don't bind to any internet reachable instances or interfaces I think 90% or more of the breaches that we have seen would have been protected if you didn't bind to any internet accessible interface so that that would be that the first step I know there were a couple like some people told me well we had a firewall running and then the firewall was misconfigured for an hour and then somebody got our data in the meantime that's a bit unfortunate yes the other thing was we were very honest security was - one of

the main drivers for the commercial products and it basically allowed us to build a lot of the functionality that we have today and basically with the investment from the money that came from security we can now feel it on top of many other things but we probably would never never have gotten - for example the machine learning stuff and also more of the security stuff that we have recently acquired if we hadn't had that in shall fuel of keeping us going for a couple of years initially which is a hard trade-off but especially if you are an open source company you somewhere need to make the money to actually keep the engine running can I ask one more question I have been

wanting to install from elastic to off this actually face to face you don't need to answer because they know it's more for business question as you said the security feature is for free recently don't you think that more of a response to Amazon AWS actually going out of their way and saying we are just going to code this and put it out for free so everyone who is interested in using the open-source better realistic now have also security by default for free don't you think is more of like cash you know when trying to make sure that Amazon don't really take the customers away from elastic in a sense you don't need to answer that I felt no

so the thing is to be honest we we evaluated it with every major version if security should be free or not the decision was generally made that it's not yet the point but then with kubernetes we kind of like had to make the cut that's also why we had this weird switch that it just changed him in the first minor version of seven and this release was never planned actually like what is seven point three should have been set as a set sorry seven point two should have been seven point one but we since we have this kubernetes operator mm-hmm did we were bringing out then and you couldn't run in kubernetes in a secure fashion that's why we

basically had to push that out so this was really a kubernetes decision like otherwise it probably wouldn't have happened at this time thank you thanks for the talk so mine is more of like a potentially like more of a query it's like why is it so hard for you guys to bootstrap TLS because I mean it's been done before many times there's plenty of there's plenty of solutions I mean devices bootstrap TLS all the time yes it's not science it's self-signed but you can still create certificate based on one for instance and if you're in a public domain you could potentially bug in things like let's encrypt so I'm just curious as why is it so bit difficult to

bootstrap that TLS part I think one of that's all generally we we want we don't want to have self-signed certificates if we can avoid it and then it's sometimes when you boot up the clusters will be very hard to figure out where will people try to connect to the cluster so you will have a hard time figuring out the right certificate in the bootstrap process where we do have bootstrapping of the certificates for example is with our kubernetes operator now they do create self-signed certificates automatically because that is kind of like a bit more of an automated environment and like abstracted away and then we don't have to care for all the different operating systems that we

support and all the different installation methods because we that starts with homebrew and goes to tar.gz then they opened RPM and msi and it's just a lot of different operating systems and platforms I'm not an expert on till has to be honest I just know from my colleagues and they always say it's kind of a hard problem and we're trying to make this better and maybe more automated but so far it probably also failed because of resourcing because it's probably owes a lot of testing stuff and what breaks and helping out people who break TLS in a weird way because for security I think half of the questions we already get are probably TLS related even with the

wrapper that we have insert util this one which wraps the security - any other question

in advance yes yes on the schedule in terms of resourcing just curious to know how many people or or how is it split in a team like people concerned with security but in your experience you need in terms of resources to follow the feature development lifecycle that you're pushing out so all the new stuff and how many people do you need for security you mean as an end user of elasticsearch or as elastic so I mean Security's security is getting a bit wider now because we have so many different areas of security so we have core elasticsearch security people are four or five full time people or something like that that's you but then of course

the the seam and endgame and all these these are totally different teams and these are way more or internal InfoSec all of them are different ones but we have a couple of people full-time on security and also globally distributed which is a bit of a feature so we ideally we have somebody in different regions and time zones that whenever some bad security stuff would happen they could take over all they could hand over to the next person so we have Tim in Australia we have Yanis and one guy in Romania we have at least one or two guys in the u.s. we're getting too large I don't remember all the names anymore but we have like we have a couple of

folks all around working on that and it's true for a long time it was only one person and that was a bit of a bottleneck and he basically had to decide am i fixing bugs or am i developing features and it was a complicated trade-off luckily we are in a better position now other questions if you want to have stickers I even brought the big ones like you probably know that that one the Belk or the LP like it's B and an elk and that's kind of like where the stack came from so if you want to have stickers grab stickers afterwards final chance for questions did anybody get hacked using a no sequel datastore

okay nobody dares you can tell me in private afterwards I'm always curious anything else normally I try to take a picture with you so I can prove to my colleagues that I've been working today because you know we're a fully distributed company and okay today I have Aaron here but normally I'm pretty much alone and nobody knows where I am so let's do this one I'm wondering if I can all get you in one picture wave very good so this was your last chance any questions no okay thanks a lot I think we have a coffee break yeah thank you