Security Tradeoffs in Elasticsearch

Name: Security Tradeoffs in Elasticsearch
Uploaded: 2021-01-15
Duration: 22 min
Description: Philipp Krenn from Elastic walks through the evolution of Elasticsearch's security posture, examining design tradeoffs between ease of use and defensive defaults. The talk covers cluster auto-discovery vulnerabilities, production-mode bootstrap checks, scripting sandboxing via the Painless language,

BSides Budabest · 202022:0028 viewsPublished 2021-01Watch on YouTube ↗

Speakers

Philipp Krenn

Tags

CategoryTechnical

StyleTalk

About this talk

Philipp Krenn from Elastic walks through the evolution of Elasticsearch's security posture, examining design tradeoffs between ease of use and defensive defaults. The talk covers cluster auto-discovery vulnerabilities, production-mode bootstrap checks, scripting sandboxing via the Painless language, content-type sniffing CSRF attacks, credential management, TLS configuration, and the ongoing challenge of exposing unsecured instances to the internet.

Show original YouTube description

Show transcript [en]

hi and welcome i'm philip and i want to talk a bit about security tradeoffs in elasticsearch to you so let me share my screen and we'll take it from there so here we go um security traders in elasticsearch um why am i picking this topic to talk about so i work for elastic the company behind elasticsearch my official type is developer advocate so i mostly talk about the good stuff that we do but today i want to take the opportunity and talk a bit about the stuff that maybe wasn't always that great or things that we have learned over the past years so this is a bit about the story of how you can improve your own product

over time if you have never heard of elasticsearch this is the elastic stack where elasticsearch is sitting in the middle as kind of like the data store it has a rest api it's scalable it's very widely used if you search on github wikipedia stick overflow behind the search box there is always elasticsearch doing the search for you or it's very widely used in many other use cases around logs also in security use cases but that shouldn't be the topic i really want to talk a bit about the evolution of elasticsearch or more broadly how nosql data stores approach security generally so let's dive into that um this one is of course a joke um yes the best argument for

nosql is that you can't have sql injections um if you don't have sql is kind of true but that's not really the point the idea here is really more did nosql improve security overall for data source yes or no and for that we probably have to go a bit through a development and where we are going so this i want to have is that initially you're very focused on the ease of use to grow and well get your product out there and as you progress along things will change but we will come back to this thesis pretty much at the end so we start with the premise that the ease of use to grow is kind of

fundamental so part of that was that initially elasticsearch was always binding to all the interfaces which was great if you wanted to demo some clustering and you would just start the binary it would bind to all interfaces and then you could easily form a cluster with other nodes obviously for security reasons this is not what you want and since elasticsearch 2.0 which was years and years and years ago elasticsearch is not doing that anymore so it is only binding to localhost by default um and we'll get back to why that is important and why this still doesn't solve all the problems but this is something that you really want to have not to surprise people they

just start your product and then suddenly they run a service and whoever wants to reach that might reach that you don't want to be in that position um the next thing that we did initially in elasticsearch was we're clustering automatically so besides finding to all interfaces by default um elasticsearch would also listen and broadcast on the local subnet to see are there any other elastic search nodes with the default elasticsearch plus the name but every cluster would have the same by default so it was kind of a funny story when i was running my first last search training way before joining elastic um that i had a couple of students in one subnet and each one was starting their own

elasticsearch node and those would just cluster automatically or form one big cluster because the binaries found tool interfaces and the cluster was just scanning the local subnet for other instances everybody was using the default configuration so they had one big nice cluster which was an interesting experience and showed how easy that was but it also showed the problem that it might create chaos that if someone on one hand inserts some data and somebody else tries to delete data and don't think they're doing that locally they might be doing that to the data of somebody else not something you really want to have which might be even worse let's assume you vpn to your production system and

then your local test installation clusters with that installation there that might lead to a very bad day when you think you drop some data locally and maybe you actually do that in production so that's another thing what we stopped doing at some point just because it was very easy to get started but generally from the security hygiene or in general in terms of like getting to a a state that is predictable that is not where you want to be also what we added over time was the so-called production mode so the assumption is when you're in development you don't really care about some settings so much like file handles and some configurations um whereas when you go to production

you want to really have those set up correctly because otherwise you will fall on your nose pretty happy and people will always complain if that happens in production so what we would rather do is fail quickly in production work have people fix that and then set up a cluster properly rather than running into unexpected issues later on and how that works is that here some java code um it's from the current release basically what we are checking here is if elasticsearch is only bound to the loopback interface which is also default we're in development mode unless we change that and bind to other interfaces so we could form a cluster then we assume we are in production mode

and then you need to fix these specific settings which i will show on the next slide you need to fix those um to actually start the process otherwise the process will terminate why is it not a configuration flag because otherwise everybody would just comment out the configuration flag and everybody would say like oh no i'm in development mode i don't want to fix this stupid thing right now but for everybody's sake you want to make sure that they are really in either development mode or if they are not in development mode then you want to fail hard if you don't want to give any work around so there is no way around that here are some examples

of file handles um specific garbage collectors sparks in the jvm that we know that might corrupt your data things that will make your day terrible in production later on and that's why we checked them upfront in those so-called bootstrap checks current question is can you run elasticsearch as root and no you cannot and obviously you should not but it is not possible so what we do here is this is the example in our code where we check if you're running as root so we're using the java native access features to check that down here if you're on windows we have no idea and we don't know and won't check this but on any unix operating system where you have

the concept of root we will check that and actually when this definitely running as root if this returns true then we will just throw a runtime exception and say like you cannot run elasticsearch as rude because nobody should do that right and everybody agrees that this is a very bad idea if you try to do it well docker comes along and people get interesting ideas and we see that on our issue tracker every now and then admittedly this was a little while ago um but here somebody just commented well this is merely annoying i want to run my process as root and if you have any security background probably you're like yeah no i always call this the yolo mode

so what i always imagine is that people assume that they have from their data stories something like this yolo and we don't run that way another thing that caused a lot of the bad security issues that elasticsearch had over its lifetime was scripting the obvious choice is you use a general purpose scripting language that is out there and people know it's easy to integrate for you it's easy for your users to integrate the problem is tightening down a general purpose programming language is surprisingly hard because there are often ways to break out of the sandbox or use some reflection to cause something that you shouldn't be calling and that's why a lot of the really bad

security issues that elasticsearch had over its lifetime were related to scripting and what did we do we created our own scripting language which has the slightly unfortunate name painless and then often people will complain that it is actually not that painless to right but rather painful um though the back story is the creator of that scripting language has chronic back pain and his dream is to be painless and that's how how he came up with that name it's not because we think that that programming language is scripting language is that painless actually um but that we could argue so the goals here where we want to have a secure language and we want to have a performant

language and the language was really written just for elasticsearch so just to expose the features that we deem correct or like the to be usable in a datastore so there is no way to break out of the sandbox because we control what is available in a language also you can do some nice checks around you might run into some error later on and you might just block specific operations like recursion or something like that also performance um making sure that stuff is cached and pre-compiled to make every iteration or call fast was another design goal of why we created this new scripting language there were other scripting languages before widely used but we removed all of them mostly for

the sake of security because again and again they were causing some trouble another thing that we learned over time is that leniency is the devil and one thing where we were leaning in was content type guessing so you could just run the code command so this would be in the relational database the equivalent of select star from all the tables basically so we're just using curl and we're running that query against elasticsearch and we're sending a json document but we don't provide any content type and elasticsearch would basically apply very simple heuristic it would say like is the first thing that you have in there in opening curly brace well i guess then it will be jason and i

will try to interpret that as json so it will try to sniff that probably you can already see what might go wrong here um if you have text claim that is treated as safe in course and that can lead to stupid security issues so for example if you would use this very simple example with jake fury what you would do here is we do this agex called we do a post against for example localhost 9200 the default port where elasticsearch would be listening and this would insert a document into elasticsearch right away because text is considered safe whereas json wouldn't be considered safe so with the right content type this wouldn't work but since we're sniffing out

the content type this request would just work and you could easily delete data in a similar way or also right to another endpoint so you could just do some cross-site request forgery because of that by now you need to define the content type when you do a request something that everybody knows is a bad idea default credentials we had those for a long time and we called them elastic and changed me and obviously nobody ever changed change me and then it was pretty much pointless um so needed to find a better solution and then people say like yeah well this is easy you just run some interactive thing in the installation process the problem is it's not necessarily that

easy um looking at all the operating or all the ways to operate and install elasticsearch maybe you don't have an interactive installation process what if happens if you run a docker container or if you use a kubernetes operator um many of these are different that you don't have anybody typing anything interactively and you need to find a generic way so one way how we approach this is you can set an environment that environment variable with a bootstrap password and with that you can then create your other accounts and go from there which leads us to another point um clearfix passwords are obviously not a great idea to have um for that we have added the option to

have a key story though so far the key should score key store is only obfuscated and not properly encrypted but we will add password protection to that in the future as well but for now it only obfuscates your credentials but it's still better than plain text passwords um tls certificates are always a joy especially when you have lots of users and many of them are not so familiar with tls and all the errors that you can run into and also java can be a joy to work with around tls so we have our own binary to actually generate certificates for you and you should use that we have invested by now a lot of time to also

make the error messages better and actually help you get your certificates but tls is one of the things that is especially around security a common pain point unfortunately still authentication well by now authentication is freely available so same 6.8 and 7.1 before it was one of the main things that we could monetize and build all the other features and since it's all hdp you could always put that behind the reverse proxy to basic off um from the reverse proxy um terminate the elastic the reverse proxy close down the data store through a firewall but by now and especially because of kubernetes because running a reverse proxy in front of it is not that simple anymore and who communicates with

who because of that we have made the security features like robust road based access control tls those are freely available and it is highly advisable to use those they are not on by default unless you have a paid version because then we would enforce that why because we only added them in 7.1 enforcing um authentication would be a breaking change and we will need to wait for the next major version and if that will have authentication enabled by default we're still working out because if we want to have that probably we want to enforce tls as well and that might be a bit hard to swallow for some people to set up initially but we will see and

hopefully we will get there but we'll see um which leads us to the fun ransomware problem of i have my data store and well it only binds localhost but i just want to let the world know that i'm running this so i bind it to all interfaces i don't set up a firewall i don't set up a reverse proxy i don't set up security i just open it up and then you can just find those on children like you can just put a default port of 9200 look for anything that reports back within 200 okay and you will probably find some elastic search instances every now and then you find actual data and have breach that is more or less a common

occurrence a lot of them are by now probably also forgotten test instances or honeypots as well but what you will then normally get is if you just query that vulnerable insulation here for example i just list all the indices that it has and then we have this one index called please read and well we're friendly we're interested it has one document um let's read that one document and what you would then often get is something like this um please send half a bitcoin in this case um to this address and then you might get your data back or not so the general idea is somebody downloads your data keeps it safe um deletes it on your

server leaves that message if you pay you will get your data back maybe maybe they have never taken that backup and will just disappear after you wire them the money um also with the price of bitcoins um they tend to be a bit volatile so there was a time when the ransomwares actually said like why are the dollar equivalent of this in today's bitcoin because it was just fluctuating from one day to the next so much it was unpredictable what you would need to pay um but maybe by now it's a bit more stable again but yeah great use case for bitcoin um and then you might run into the matroshka problem so the matriarchal

problem might be that somebody takes the data leaves the message this is how to get you data then somebody else since your data store is still unprotected might take that message leave their own message and say like if you want to get kind of like the original message how to get your data back you pay me first you can put one ransom attack into the other and then you would need to pay all of them to maybe get your data in the end back that's what i call the matroshka approach here which might not be all that fun and we did run into that ourselves when we ran some test installations and we didn't bother

to secure them um the german cert actually scans their ip space to see if anybody has any open elastic search instances and they might send you this nice email to say like hey you have a problem there you should actually secure that and then ideally you do and you don't get ransom um so coming to the end here starting off being a simple or being simple to start is probably the right to get actually the broad user base but once you have more important data more secure use cases more enterprising customers then you really want to have these more security aware settings for running critical workloads and i think this is kind of like a normal

maturity process that you start on being very easy to get started to having a strictest setup over time and i'm kind of afraid there is no real shortcut to that but this is just a learning experience that you will need to go through as a product um at least that's my thesis um maybe i'm right maybe i'm wrong um we'll have a discussion afterwards so let me know what you think about that so to wrap up we have seen that elasticsearch does not bind to all interfaces anymore on the localhost it doesn't cluster automatically anymore because it was convenient but it was also dangerous butcher checks are just checks to make sure you run the last search the right

way and for example you don't run this route but that is never possible also scripting was a common pinpoint that we fixed by writing our own scripting language which introduced new pain points but at least it seems to have fixed the security problem pretty well because we haven't had any security issues that i would be aware of since we introduced our own scripting language so from the security perspective that worked content type um yeah requiring the right content type and not trying to guess um will also help your security um thanks to all the fun cross-origin requests that browser can do um having default and clear text creations is clearly a bad thing requiring tls especially for anything

that can form a cluster over the network is probably what you really want to have and yeah authentication is a tricky subject and that you should do properly and hopefully we'll get to the point that we have this enabled and without making the bootstrapping process of a cluster too hard so hopefully we will get there as well and that's pretty much it that's kind of the learning experience that we had over time i hope you have learned something as well let me know in the questions and i'm happy to discuss with you thanks a lot for joining

Security Tradeoffs in Elasticsearch

Related talks