← All talks

The Secure Metamorphosis: Streaming Logs with Kafka and TLS

BSidesSF · 201929:35218 viewsPublished 2019-03Watch on YouTube ↗
Speakers
Tags
About this talk
Apache Kafka is a widely adopted pub/sub messaging platform that can scale to handle huge volumes of data. It’s a powerful technology but notoriously difficult to configure, especially when it comes to Transport Layer Security (TLS). In this session, we’ll cover TLS best practices that yield a secure and compliant system, as well as critical techniques to maximize performance.
Show transcript [en]

good afternoon everyone we're ready for the next session from Tyler Patterson on the secure metamorphoses streaming logs with kappa and TLS welcome all right thank you we give this work in alright today we're gonna talk a little bit about Kafka the few things I'm gonna run through are gonna be just a quick introduction of myself the data that we deal with what Kafka is and what the security considerations are for Kafka how to secure Kafka and then some of the lessons that we've learned as we've been doing that at our organization they asked me to do this as a reminder there is a slide oh and then that's the the tag for the room if you want to have

questions and we can we can review those at the end so I'm I work for distill networks we protect against automated threats I'm a product manager there I'm responsible for our research and our data so we have a lot of data that flows through we sit as a layer 7 inline bot mitigation and some of the things that we protect against are people that might be doing account fraud web scraping trying to do card card cracking or any other type of abuse you would use an automated automated piece of software to do because of that we have a ton of data that we collect we use the data for monitoring our systems for visualizing the data for doing detection and then

also use that to provide professional services to help consult with our customers so some of the pieces of data that we have are we have web mobile API protection we do browser interrogation we have a Mobile SDK and all of that data flows in to us and then we consolidate that and use it for those various purposes we take in about five billion messages per day and it's just growing rapidly so our messages are typically in some sort of structured format like a protobuf or JSON but we have them coming in in various various formats depending on different parts of the system so these are an example of some of our producers there are different ways that you can

integrate with this still we have a global CDN you can have your own virtual private cloud you can have a dedicated device and your infrastructure you can take our connector and install it into your application and then in addition to those those are constantly reporting telemetry access logs there also we also have the browsers where we have JavaScript that goes to fingerprint the devices so that we can track them all of that data comes back to us and includes information about the browser it includes biometric data that we're gathering to determine if you're a human or not and then of the same thing and coming from a mobile sdk we've got the same sort of telemetry data so all those

things produce logs and there are a lot of different things we want to use those logs for I've mentioned a few of them monitoring we enrich the logs with additional information to make the analysis more complete we use it to do machine learning and detection it eventually all ends up in our data warehouse and then that can be used for reporting and our portal for dashboards and then also for our analyst to you to use now if everything wants to get all of the logs without something like cough can place a a messaging broker everything would have to be aware of where the logs were and try to gather them and it gets pretty unwieldly and

messy and that's where Kafka comes in so how many people just for a sense how many people here are familiar with Kaka Kafka or used Kafka alright cool all right so not this Kafka the author instead it's caca the pub/sub message broker so in Kafka some of the basics about Kafka is there's two different types of clients you have producers they're the ones that are publishing logs and then you have consumers they're the ones that are consuming logs then they typically what they're the logs that are sent or considered messages those messages can be sent as individual messages or also sent in batch so you can reduce how often you're calling out to the the broker the message a message

schema that can be varied can be whatever you want that you put into there you can have no schema if you want I guess and the Kafka's or Copco organization is organized by topics so you publish to a specific topic you can also subscribe to a specific topic and consume those logs once they end up in a topic the data is broken out into partitions Kafka uses sequential writes to disks they get high performance those partitions are spread across multiple brokers and then every but and then the consumers can go in a consumer group and consume from each of those partitions and we'll take a look at just real briefly what that looks like they then use an offset so you know

we're in the log stream you're consuming and that way you have some feel over redundancy if you need to go back and reconsider Group has their own offset so they know where they are in the stream of data so this is sort of what it looks like on the left you have the producers on the right you have the consumers they publish in they'll go to a leader broker for each topic that leader broker how it can replicate the data so that you know one broker dies that data persist and then the consumers consume from the the lead partition on that for the for that topic this is an idea of just what the partitions look like clogs are just

being sequentially written at the end and then you just know where you are and consuming those logs all right so with coffee you have Kafka itself with the new all set and the brokers within that you also have zookeeper zookeeper is used to help orchestrate the work that the brokers are doing so zookeeper is just a distributed open source coordination service and this helps you make sure that you're coordinating services across each of the brokers it also a man is a couple of other things that we'll talk about like access control lists and and some of that so with Kafka in place instead it can look a little bit cleaner like this you have one Kafka service obviously with

multiple topics you know it might be a little bit messier than that we might have you're not gonna have just one topic everything's gonna push to otherwise your have to spend a lot of time parsing those logs figuring out which of them you need to consume and throw away so you can segment them so that you're only consuming each service is only consuming the logs that are important to them so today we're going to talk a little bit about securing Kafka some of the reasons you might want to do it is just for good security hygiene you might have some requirements around compliance for PCI GD P R people pH I like HIPAA and then

you also want to just avoid any problems you might have if you just let anybody publish you could have abuse and and contaminated data so there are several things you need to consider when you're using Kafka the first is that you're encrypting the data in transport so that is data that's coming from a producer into the brokers or out of the brokers to the consumers and then also from server to server as well this you also have authentication so you want to make sure that you're authenticating the clients that are communicating with you so a client in this case I'm gonna use the word client and when I say client I mean either the producer or consumer

consumer authentication again just make sure that you're communicating with who you intended you also have encryption at rest and this is to ensure the data is protected while it's being stored on disk and then you have authorizations so you might have encrypted communication you might know who you're working with but you want to limit them to certain privileges within the service so out of the box just within each of these so transport encryption all messages out of the box are send in plain text by default authentication there's no authentication required by default encryption arrest there's no special accommodations made for Kafka with within Kafka for it we'll talk about what options there could be and then for authorization again open by

default so let's take a look at some of these the first thing we're going to look at is the khakha defaults just to take a look at what that looks like so in here the only thing that I've changed is instead of listening on on any address I've changed to just listen to my my Kafka service this is just a single ec2 instance there's there's you can see that I'm listening on plaintexts on port here is 1992 and then zookeeper is on port 21 81 we'll talk a little bit about securing zookeeper and the challenges there again there's no default authorization or authentication so that means that anybody could create or delete a topic so what that would

look like is here I have a sample Kafka client that I've connected to my Kafka service I've just asked it to create a topic there's no authentication going on here and once I've done that I get a created topic and everything's happy again this is just what it looks like out of the box and any client can connect and then a client can produce messages and anybody can consume messages so up here on the top this is my producer I'm gonna connect I get the little shout I get my Chevron I can type in if I get another Chevron that can confirms my message was was published and we can see that being consumed on

below for my consumer again they're in transport that means that we can see everything that's being sent that means it's prone to sniffing and tampering and other PII can be exposed so here I have again let's say I'm sending a you know social security numbers across the wire using my publisher if we send those then I can just you know I'm just listening on anything that's going to my Kafka server and I can see in the bottom right my social security number I'm showing up that is not my actual social security number so Kafka transport encryption so Kafka just has normal transport encryption and through TLS you can set it up as either one way meaning that the

clients only need to encrypt what they're sending to the brokers or you can set it up as mutual TLS it supports TLS 1.0 1.1 and 1.2 what this looks like is a quick TLS overview is on the left I have a caucus server on the right I have a copy of client the caucus server is going to have a key door and a trust store in the coffee client in this example is only going to really need the trust or the Kafka's server will have a signed certificate from our certificate authority or your certificate authority the Kafka client will have the public certificate and then in the when the copy of client wants to initiate communication they'll

get back the signed certificate they'll validate it and then they'll initiate secured communication this is an overly simplified version of TLS and so let's take a look at that what you need to do to set cock-up to have this one-way encrypted communication first you need to identify your CA so hopefully we have something to have a CA at your organization or you can you know create your own CA and I'm gonna do that in this in my examples here you'll generate a key store on the server you'll generate a java key a java key store you'll generate a new key you'll generate a server request get that signed server cert from your CA and then

install that server cert into your key store install the public CA assert into your key store and then job generate a Java trust trust store and put the public CA shirt into that trust or as well then you'll need to go into Kafka and configure the server properties and then you'll need to ensure that the new the ssl port that you've configured is open and then you can go ahead and restart your server so let's go through those steps really quickly I'm gonna blast through these because this is all normal just like CA cert stuff but just for an idea Here I am generating a new key store within that key store I'm also going to just look at what I have in

there after I generate a new key I can see that I have a key for my my don't my host caca de taille Lee Paxton comm and then I'm gonna go ahead and import the root certificate from the from my public CA and then I'm going to request a sign certificate using that public certificate in my my key that's in my key store so now I have a certificate request I'm going to send that over to my CA if you're paying attention this same server also has my cert of my CA cert and key on it so it's pretty easy for me to do normally you would you know send this over to whoever it is that's responsible for the your CA

in your organization so I generate that and now I can verify my sign cert I have a signed cert for my host from my CA and I'm gonna go ahead and import that into my key store so now in my key store I have my key I have a sign sign key and the the cert from the public CA so again that's all pretty pretty straightforward if you worked with any sort of encryption in the past and then the next thing that we need to do is open up our server properties file so in the server properties file again I just initially was set up to listen on port 90 92 and I'm going to configure it to listen on

port 1993 I'm gonna still listen on port 1992 as well you can have both in place obviously that's not you know if any it's not going to be secure but I'm gonna use that just as I walk through my examples you could eventually remove that or if you want to not have inter broker secure communication you would need to continue listening on port 1980 to 1992 as well there's some additional configuration settings that I have here that point to where my key store or my trust or are and have my my passwords for them the last thing that you can do this is an optional setting so you don't have to set this but you can set the

protocols that you allow so ideally you would limit it to TLS 1.2 you could do 1.1 optionally as well hopefully not 1.0 as it's no longer considered secure the next step is to open up the your port so I'm going to open up my port for SSL so I can allow secure communication and accept income ends and accept incoming requests on that port so now that I have my server set up the next thing that I'm going to do is set up my client on my client I want to go through and I just need to generate a trust store here so only a trust door in this first step unless I'm doing mutual TLS and we'll

get to that I'm going to import the see a public cert into that trust door what that allows me to do is confirm that the sign cert was signed by I my trusted CA and keep that chain of trust now neck the next thing that I want to do is set up the client properties and then change my connection information to include those client properties in them so again creating a trust or using key tool I have a now I imported the see a public certificate into my trust store and then I'm gonna go into my client properties and I'm going to set the security protocol to be SSL and point to my trust store and give it my password so now

that's all set up I'm gonna save that into a file called client properties and in my connection information I'm gonna create a topic and you'll know that when I'm creating my topic I'm pointing to not the I'm I am pointing to sorry I went once led to many there we go right there so when I'm creating my topic I am pointing to the zookeeper port so zookeeper is responsible for creating the topic so even though I've been forced as a cell I've set that in my client properties I can still create topics at will but if I want to connect to as a producer and want to produce on my new topic called Kafka secure I will

I need to pass in the parameter for my client properties file and now I'm able to successfully produce messages to my Kopke broker so let's take a look at what that looks like here I have the khakha broker with the client properties passed in my consumer does not have it and it is still listening on port 1992 so in this example I have both in place I'm publishing I'm encrypting messages as I publish them but when I consume them they're not being encrypted on the way out so still not quite what we want so I'm going to update my consumer connection to have the same properties let me use the same properties file and in this case my

consumer my producer are both on the same box so I'm just using the same good properties file I'm gonna update my connection and now that I'm connecting over both on port 1993 on both of them I'm able to both encrypt as I publish and encrypt as I consume all right let's take a look at the next thing Kafka authentication so within Kafka authentication this is going to authenticate the clients that we are communicating with we trust so we already have the see a public certificate in our trust store so now we just need to get a signed certificate to the clients so once we have the client sign client certificate that enables us to have a mutual trust and then we can

have security communication to do that will configure the Kafka to require off a Kafka server properties to require authentication when connected over when we're using SSL there are a bunch of other optional parameters that we have here so within the client we've already talked about generating a key store I'm gonna skip through these pretty quick and just look at our are here we have our certificate our signed certificate we install it into our key store and now we're going to update our client properties to include the key store in the client properties so now I've actually changed this is this is SSL off property so I'm gonna start using that in my connection and I'm gonna use that

in my connection as well for my consumer again still communicating over my secure SSL port so in this example example now I'm I have mutual TLS I I know as a as a client that I'm communicating with the server that I want to and I know as a server I'm communicating with clients that I want to so what are some of the lessons that we lack we've learned here a couple of things that we found is that or that job is very slow and especially in 1.8 so we get we got significant increases in performance by going to Java 1.9 and zookeeper as I mentioned doesn't support TLS so to enable secure communication between them easy they're gonna take a lot of

effort or you'll want to take other measures to help secure a zookeeper and then the other another thing that we learned was that we wanted to confirm the cipher suite compatibility with the different client libraries that we're using our producers actually have a go Kafka library called sarama and only supported a certain subset of cipher suites and so we just needed to go through and ensure that we were using one of those that was compatible other things we learned is that it's this is just really tricky to automate I'm doing it on one broker and just one consumer when publisher you get a lot of keys you're going to want to have some sort of PK PK management and the other issue

that you have is if you issue these certificates there's no easy way to revoke them there's nothing in Kafka to ensures that so to ensure to do that there is there are a couple of options that you can that are outside of Kafka that you can pursue to support of the to use the native Java certificate revocation list by jumping into authorization so authorization is the next level this is when we want to do additional authentication of individual users and then we want to manage what those users have access to ideally you're doing authentic authorization between all your brokers from your brokers to your zookeeper and from your clients to your cluster as well authentication comes in a few flavors

there's plain this is just a simple username password there's the salted challenge response and authentication mechanism which is again username and passwords with a challenge or assault and then there's Kerberos which is a ticket based authentication and it would be the recommended option to pursue Kerberos is obtained and so it's going to be painful hopefully you already have Kerberos setup and you can leverage that and you have an expert somewhere that you can rely on today I'm going to walk through just doing plain authorization so you can see what that looks like for that we're gonna have to update our listeners we're gonna have to enable Sasol and then choose the broker protocol and then configure the Java

authentication authorization service and then add users to that and on the client we're just going to need to change the client properties to include the the username and password so what that looks like is in where before I had plain text and SSL I've changed SSL to be sasso underscore SSL and then I have a couple of additional server properties here telling it which type of Sasol to use and I'm also telling it to use Sasol authorization between brokers and we're not gonna be able to see that today because again this is a single single broker cluster so you can add the Jazz config to your server properties or you can you can have an external file

and point to that we've had some issues just pointing to an external file and found that it was a little bit less troublesome to put directly into the server properties if you wanted to do that this is what the the the Jazz configuration would look like so in the server file I'm just gonna add the listener I'm gonna put a listener SATA config on the listener name that I want it to be on and then the other nice thing is if you use the external Jaz file you can include that in Kafka ops which is an option that runs all the time and just starts out blank when you're starting your khakha brokers so this is what I've created a user I've

called it Kafka client giving it a password now I have a password and I'm gonna include that in my client properties so and I've told it also to use Sasol in the client properties so I also created a bad password and I'm gonna go ahead and test that out so first I'm gonna try it with the wrong password properties file and we can see that I get an error saying I have an invalid user name or password so before when I had that's one of the nice things about using the authorization as you get a little bit better messaging around if you're not able to successfully connect when I don't know if I can't remember if I had an example

but if you were using just encrypted TLS and you connect and you try to produce you know you don't get any message back telling you that it didn't produce or why all you get is just you know that it didn't produce it doesn't it doesn't produce so this is a little bit better for troubleshooting now I'm gonna change it to the correct properties and here I'm gonna go in let me hit play manually so now I'm gonna go ahead and connect and in this case now I am I'm authenticated using my username and password and the messages will publish so now I have authenticated messages the only thing here is now my publishers and consumers can read on everything

everything is open open to the world so now I want to go in and add some access control lists to do that Kafka ships with a pluggable authorizer the Ackles are stored in zookeeper typical operations read/write create the resources include the topic the cluster and then a consumer group and here's some examples of those I'm gonna jump through a little bit to do a configuration you're gonna edit your server properties you're gonna add the authorizer add super users which is an important step it's really easy with zoo P zookeeper Akal management - it doesn't depend it's gonna overwrite so it's really easy to wipe out the the authorization that you've set up and then we're gonna add some ackles and

edit the client properties so here I'm adding the authorizer I'm setting up a super user and I'm gonna go ahead and try to connect we see here that I was unable to connect because I have no allies access now to kafka secure so because I've setup enable or I've added the user I'm connecting with the user I get them and I have added any Ackles I'm not able to do anything so same thing on the client I'm not able to access the topic so I'm gonna go ahead and I'm going to first create a user a wildcard user there are no groups in these Ackles so you can either do wildcards or specific named users and then I'm gonna set that up to

be able to any user to be able to read from my Kafka secure topic we see that that takes and now I'm my consumer it's you can't tell here but I've started my consumer it's consuming I don't get an error message but my producer does get an error message that it can't access the topic to produce to it so now I'm gonna go add for Kafka client my user that I set up in my properties I'm gonna add the operation right I'm gonna make sure that that takes and then now I I can see that I have secure messages that are authenticated authorized appropriately so some of the lessons that we learned here again Kerberos is a

pain I can't say it enough Akal changes I talked about those being overwritten making sure to set up a you super user there's also if you do end up using doing a coal management using sass a plane or sessile scram you would there is a tool that helps manage the the those ackles and has also appropriate auditing in place for that so that'll be useful another thing to consider is that Kafka is responsible for the ackles that has some metadata in there so you want to separate out zookeeper and limit access to it and then for encryption at risk really the best option is just to increase encrypt your filesystem and we're using if you're doing it in AWS

you can use encrypted EBS volumes or in HDFS you can encrypt there a couple of resources there's all the documentation it's dry as you'd expect but really thorough and helpful and if you're doing an ad above us there is a best practices for that there's also a couple of things that I recommend one is the the O'Reilly Kafka book and then there's there's a course by Stefan and GERD I'm not gonna attend their last names but this is a they have a Kafka security series it's $15 super thorough helps you go get through setting up a basic set up so I think I'm out of time I don't have any time for any questions right now so

I apologize but if you want to find me I'm happy to jump in and answer any specific questions that you have thanks everybody Thank You Tyler there was one question what is the relationship with Kafka and gdpr so maybe you know that's something that you can oh sure yeah so just for gdpr you need to keep track of PII and you need to ensure that it's that it's secured so for us it's just ensuring that our data is encrypted in transport on behalf of besides we thank you for your presentation thank you appreciate it thank you everybody