
the next talk we have by Austin as Shara as a service for real-time malware detection so here is Austin oh thanks yeah so as you as you notice we sadly could not stick with the name you know we envisioned Yara as a service Joss but we were we were not allowed to use that name so it's actually now binary alert and the reason we call that that is that that's similar to our other Airbnb as other open-source security tool which is Stream alert by a show of hands has anyone heard of stream alert by any chance oh one Wow okay cool so yeah this is Airbnb a second contribution in the serverless open-source security space stream alert
and now binary alert for binaries so one of the themes of the conference one of the things that that's come up a lot of times is that like security tools have to scale along with the rest of the world so when you're designing a security tool it has to be designed the same way that any other product would be designed it has to be designed to scale it has to be designed with profit and loss in mind so at Airbnb we're at fairly large scale and growing really rapidly so to give you a sense if we count all of the binaries that in some subset of our infrastructure all of the unique executable binaries we have currently
about two and a half million unique binaries whether that's from laptops and servers and so on and to exacerbate the problem it grows at a rate of almost 1% a day a weekday people downloading new things and updating packages and what-have-you so a large and exponentially growing base of files presents a large attack surface and so we need a way to to analyze all of those what we'd like to be able to do is to detect if any of these are malware every single binary is a chance for an attacker to gain a foothold and so what we'd like to be able to do is to analyze all of these so Yara is the state of the art from
our analysis and there's been a couple other talks about it there's a really big one tomorrow about how to write effective your are rules so Yara is essentially pattern matching for binaries you can think of it as as sort of like high-powered reg X for identifying patterns and binaries so as an example this is a your rule that will actually be including with binary alert when we open source it at the end of the talk today and so a rule has first metadata about arbitrary key value pairs you can describe in the rule and again more talks tomorrow we'll go into detail about how to write good URLs and there's a excuse me a set of strings which can
be textual they can be hex and they allow regular expressions and so this is where the pattern matching magic happens where you can say match an arbitrary pattern and so anyone who was actually at the last talk one of the things that he was able to do was to copy paste code from other malware modify like one thing and it was not detected by anything and that's the problem that yar is intended to solve Yarra is supposed to be more generic where you say what classifies an entire family of malware right maybe it's the decryption routine or maybe it's a Bitcoin address and ransomware or something like that so you are is designed to be more generic to solve
that problem so we we're looking for a tool that since your ax is kind of the state of the art and as you see the security community using it more and more and more we wanted a way to apply this to the enterprise and so we were looking for is there a low-cost scalable batteries included solution that's easy to deploy and maintain that we can use in our infrastructure and we weren't really able to find anything that met our needs so we built it ourself and that is the idea of binary alert where it's an open source real-time serverless malware detection so open source at the end of the talk today we're going to click the button and it'll be available
to use and we're going to continue to contribute to it like stream alert real-time in the sense that if you have all these binaries you need to know immediately if one of them is detected as malware so as soon as a file is uploaded MO and we'll talk about how it works but as soon as the file is up it'll it'll let you know within a minute or two weather and alerts fired serverless that there are no servers to maintain it's not an easy to instance running it's not like a box sitting in a data center it uses Amazon's server lives lambda component which runs in a transient container which makes it a lot easier to maintain and a lot more secure
the attack surface is much smaller so what are the things you could use this for binary alert is is essentially like arbitrary file analysis so for example Enterprise alerting where you could imagine taking all of the binaries on all of your infrastructure it doesn't have to be binary it could be any files documents and whatnot and uploading all of those two binary alert and getting real-time alerts when any on any our rural triggers you could imagine forwarding email attachments to binary alert and scanning all of them getting alerts if anything triggers user uploads if you have an application that accepts arbitrary user input anyone in security knows that accepting user input is like the number one way to compromise
yourself so if you could upload those two binary ler you could scan that way and security research so you can because this is deployable in your own private AWS cloud you can run this against your own collection of yarra rules and your own collection of binaries to do your own analysis without having to use a public service so some of the features we've kind of talked about this in the idea but real time again in our deployment alerts actually fire within one to two minutes usually of binary discovery which is pretty amazing so someone downloads malware on their laptop and will get an alert within one to two minutes retroactive analysis every time you change the your rule set
and you deploy it will automatically reanalyze the entire corpus of binaries to see if anything new triggered so that lets you find threats in the past with information you receive in the future receive information about a new a new breed of ransomware and then you retro actively analyze and you realize that you actually had that installed on a laptop a few days ago we include some of our own rules so one of the values that we at Airbnb are going to provide to the community is we are going to provide our own yard rules and we also source rules from other open-source projects like the our rules project and the idea here is all of these are rules that we've tested
we've tested against our own infrastructure of the more than two million binaries to make sure that they're effective for what we're looking for we talked about the serverless design I'll go into a minute about what the components look like but again there's no there's no ec2 instance running there's no SS aging it's just an asynchronous event-driven framework and so to give you a sense for one of the benefits of that lambda the way lambda pricing works in Amazon is you pay per invocation and you pay for how long it runs and the benefit of that is that you only pay for what you use for example a million invitations of the yarra analyzer with 512 megabytes of memory
running for 30 seconds each which is really long only cost about 25 dollars so you can imagine this is much lower cost than say an enterprise solution might be orbiting a server where you would upload things to infrastructures code so like stream alert we're big fans of terraform another open source project and the benefit of terraform is that it manages all these AWS components for you so all you do is you run a single command that spins up all the infrastructure it's described in parsable configuration files which makes it easy to audit it makes it easy to test you can update infrastructure just as easily and so on and finally metrics monitoring just like any production
service you need to know when binary Alert is healthy and when it isn't so we include all of that for you it automatically hooks into cloud watch which is Amazon's version of infrastructure monitoring and if anything goes down if any components behaving abnormally and so on those will also fire and alert so one of the one of the common bits of feedback that we get is that this is similar to virus total because virus total all also offers a file upload service where you can scan it against your art rules and virustotal is a public service it's designed to be something that benefits the whole community and so there's there's a giant corpus of every file that's ever been
uploaded to fiers total there's billions of them and that's a useful resource phul resource and so we're not binary Alert is not a replacement for virustotal so much as an augmentation because in an enterprise you might have files of varying levels of confidentiality and sensitivity that you can't upload to a public service would you imagine there's some space in the middle where you can still contribute to fire so there are some things perhaps you could upload to ask to query virustotal for so you could you could use it in in tandem binary alert is deployable again in your own AWS account whereas virustotal is a managed service so let's talk about kind of how it works
so you have files we all have files and files can come from anywhere this is up to you the organization who wants to use it which files you want to analyze and and so on and all you have to do is get them into an s3 bucket and that will start the analysis chain so how you want to set that up is up to you once it's in an s3 bucket then the magic happens we're going to pull it into a queue which will be grouped into batches and parallelized sent to multiple analyzers so each of these orange circles that's a lambda that's like a unit of computation that's the server list component I was
talking about those lambda functions run the dispatcher runs every minute and the analyzers just run when they're told to so if nothing excuse me so if nothing's happening if you don't have any binaries nothing's running other than I guess the dispatcher which is just checking the queue but for the most part nothing runs until you upload something and then once once things have been analyzed if there's a match it will send an alert to SNS which is Amazon's notification service and you the organization can decide where you want this to go you can set up email subscriptions or SMS or what have you we use stream alert for this because stream alert already has support for all of the
different outputs you might want slack and pager Duty and things like that so the organization just defines the input what files you want to analyze and the output where do you want the alerts to go and would take care of the rest but there's a little more we do here just to make this a production ready service so Yarra matches are also saved to dynamo which is Amazon's a no sequel database service just to have a record of every match that ever happened as I said we have automatic monitoring that checks the health of binary alert by an alert uploads metrics about its processing throughput and so on those also will trigger an alert if anything is amiss
that can go to the development team instead of an analyst team and finally the retroactive analysis component where there's a separate batcher which every time you deploy by an area Lert and change the rule set it'll pull all of the binaries from your whole bucket and files like you know it doesn't have to be binaries all of the files from your bucket and put them back in the queue to be reanalyzed so that's the full architecture and again all of this has been it for you by Tara form so you don't really have to worry about any of this but that's how it works under the hood so I figure it's most useful to actually kind of show this and because
of the nature of Tara form and so on we can actually do this like pretty quickly so now we need a mirror let's do that fast where's the mouse arrangement there we go so we have a tournament terminal terminal terminal and a virtual end and so this is the repo that we're about to open-source and so all we're gonna do we have an AWS account here we have a dummy email account we have a dummy Amazon account with nothing in it so here's s3 with no buckets or anything and then we have the repo that we're about to open-source so here's the account everything once we've got that all set up oh we got all we're gonna do
is just a ploy and so what this does is this invokes the equivalent of terraform apply which is going to build each Amazon component set up the configurations for which components notify each other and so on usually takes like a minute or two but if anyone has had experience with deploying production applications a one to two minute automatic to play like this is a pretty nice experience so while that's going one of the the only part we have to do manually right now is setup the SNS subscriptions which is of course conveniently hitting hidden SNS like I said you can set up email subscriptions or as our taxes subscription so we're gonna set up an email subscription
because you have to confirm that you can't actually automate that part yet so in the documentation it explains how to set up so there it is there's the topics that are generated so these are the two separate SNS topics one for alarms for the service itself and one for yara match alerts so we're gonna say we're gonna subscribe an email address to this end point to this SMS topic email binary Alert test at genomic create subscription and then we're gonna confirm subscription and now we should be good to go so that's it now it's it's sitting there waiting for input we have two more than two but we have two s3 buckets here we go that were
automatically created so this one is the binaries that are going to be uploaded and this is just access logs so any file that gets uploaded here should automatically get analyzed and triggering alert so we're gonna try it and I asked my manager earlier today I said can you send me malware that I can put on my laptop and upload and so we actually found Windows malware since I'm on a MacBook this will match one of the yard rules that we had that we include in the in the binary Alert repo so we're gonna upload this and the screen is so small I can't even click okay so we're gonna cheat and I'm gonna unplug this for a
second I'm gonna hit upload you're gonna believe that I did that and I'm gonna put it back
okay so we uploaded the file to the bucket and literally in the time it took me to do that we ought we already got an alert that said this md5 matched this Y our rule with this metadata the mimikatz credential dump tool which is just a windows sort of a windows pen testing tool that's on github so just like that that's it so we have this running and files uploaded test three trigger alerts which is pretty awesome so that's pretty much it now we are going to so like I said we're committed to supporting this in the future like streamliner will continue to support the repo going forward a lot of open source projects kind of fall on the floor if they don't
have sufficient support and we at the Airbnb security team are dedicated to supporting it so congratulations everyone on being here until 8 p.m. on a Tuesday I'm the only thing standing between you and a night in Vegas but you get to witness the birth of our newest public rivo
and I have one password and that's not even set up on this laptop so I'll guess I'll have to do that right after but yeah this will happen in just a few minutes bring that back to here so I don't know if we have I don't know what time it is but if we have any quick questions there's also a blog post coming should be up tomorrow so yeah we we upload all of them what we basically have is a pipeline where as soon as binaries are detected on a machine that's the part that you have to setup yourself for now and then we upload them to s3 so you can imagine os query from
Facebook they're their open source project actually has fire file carving capabilities so however you want to get the files into s3 but yeah we upload everything oh sorry yeah I'll repeat the question for the recording great you quoted $25 an hour or a day design kit all those services in AWS you had is it white so the exactly and that's a good point so the the $25 that's the current pricing for running 1 million invitations of an analyzer lambda function obviously all these components there is a little bit more cost to them so the s3 bucket being the biggest one if you have 10 million binaries and you're uploading all of them to s3 there's a cost associated with that but
that would be true with any service so here what we're saying is the the computation cost is significantly lower than if it were being analyzed by a running server but yeah there are costs Anna's on actually has a free tier and in most cases a lot of those will actually fall into the free tier including lambda lambda gives you I don't want to quote a number I'd have to look it up but they give you a certain number of free invocation so for a small company or a small organization you might actually be able to run it for the
have you tested this out with sir cata any kind of file reassembly you know then upload testing from there sorry say again the the TCP and file Rio you know stream reassembly in sericata pushing the files to a memory storage engine and then putting that part of a pipeline to push up to your service oh so yeah again it's up to the organization exactly where you get the files from so you could imagine like yeah actually pulling them out of memory I can't comment on exactly how we're doing it right now but the simplest solution would be to just pull them from Disqus and as soon as it hits disk say like in a production server that you
would just forward a test3 but yeah there's a lot you can do here like if you have a good way to pull things out of memory you could dump them in s3 or email attachments or so on hopefully in the future you know we we might be able to support like specific methods of getting files but we at least have the analysis part down nice any more questions okay Thank You Austin thanks a lot