← All talks

Fighting Email Phishing with a Custom Cloud IDS

BSidesSF · 201718:51341 viewsPublished 2017-03Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
Fighting Email Phishing with a Custom Cloud IDS Phishing is one of the largest and most difficult challenges for any enterprise security team. It’s the great equalizer of security -- we all have to deal with it. We built our own email IDS at Uber with control over features and alerts means we can adjust to evolving threats in real-time. But just as important, we demonstrated that security investments can also drive operational benefits in price, extensibility, and performance. This talk will walk through how building our own email IDS in AWS helped guard against phishing and improve operations.
Show transcript [en]

these are picked up by a lambda function and this is the first real workhorse of the ideas it parses the emails into three parts you have your attachments and then your headers and your bodies and these all get dropped into separate s3 buckets where the they're then analyzed in parallel we're gonna go now into each one of these legs really quick so for the attachments once these are picked up we have a lambda function which then goes and it submits them to a bunch of static analysis services as well as dynamics and boxes these then all alert asynchronously back down to our alert console or our phantom enterprise application we also look at headers here

we look for things like the sending servers as well as SPF DCAM checks and we look for people spoofing emails we also do several other checks for X headers stuff like that the most important part about our headers is we actually log all of these to an elastic search instance we use this for contextualization throughout the hit we'll see how that plays in later but that's a really useful tool because it basically lets us say hey any infection or URL or attachment how did it get into our environment did it come from the email layer which is an important question you want to ask right how how was I compromised what was the initial vector and then we also do body analysis

here we're looking at things like natural language processing as well as exploding URLs and links and then we use vendors who are really good at looking at like ad hominem domains or newly registered domains or you know the passive DNS on a bunch of IP addresses we're seeing traffic from so and then there's one more component which is missing here which is a memcache server and we use that to deduplicate rapid events so all of this goes down to that phantom analyst console that I was telling you about and the real power there is we actually normalize all of the data going into phantom so it all comes in as this set model which is a

common event format and once it's in that common event format we can run these Python playbook on it programmatically so we can say hey if you ever saw a file hash make sure you check the whole fleet and say was that file on any of our endpoints and then give us that impact analysis how many of these people downloaded that what's really nice is we've integrated this to our email ids so we can say hey you got that binary that binary came through the email layer you may see four of them on your fleet but ten people actually receive that email containing that binary and then again it lets us take these programmatic actions on those

alerts so we can then go tag those emails we can quarantine those binaries and even quarantine those servers if we need to this is a an image of a lambda debug log this is really the workhorse of our email ids this is where we're processing email in high concurrency and parallel here you see three different emails being parsed in what looks like two seconds and this is really like I was saying the meat and potatoes of the IDs if you've never used a WS lambda it's really nice you can basically just write code and deploy it and then it does the auto scaling but what is really difficult with this is that you can't get a sense of holistically how is my

IDs functioning if you were to try and look at these debug logs for errors you go snowblind really quick because there's just too much detail here so we get another really nice feature automatically with AWS and that's cloud watch metrics so we use cloud watch metrics all throughout the IDs as a sense to get like a holistic performance of our ideas how is it performing again this has a high watermark of up to like 60,000 emails an hour so it's highly concurrent and if it's choking on any one thing we really need to understand those errors and then work them out of the system again this is really important for us because we're a global

company tons of people speaking different languages all over the world sending email 24/7 so if our email idea is choking at 3:00 a.m. on some foreign language we need to know because then we can go and address those problems so we have to have those alerts just around performance and make sure our tools are functioning right another really nice almost automatic feature out of AWS that we use are these elasticsearch headers we use these programmatically with our back-end so that way they can go and they can compare campaigns they can say hey you know we got this one attachment from this sender but then it also came from all these other senders so we can do

like really nice campaign analysis but here you're seeing the Cabana front end on top of elasticsearch and this is just nice for hunting this is another nice analyst tool where we can go and look up and say hey show me all those sending servers during this one period of time and then we can understand you know was that compromised were those different sending servers sending the same attachment perhaps part of a botnet is this a related campaign so again this is a really nice analyst tool that kind of comes with it so what we're seeing here is that Fantom application back-end basically we get all of these normalized alerts and then they go to the single

pane of glass and then we can have people do hit review and look at these alerts and say you know initiate some type of manual action so what we're looking at here is actually a dynamic malware analysis alert so we had a sandbox trip on a piece of malware and it's saying you know it's just bringing the alert and what the analyst gets or the the engineer whoever's looking at this alert what they get automatically is tons of enrichment and context around this hit so we can see right away we have reversing labs results we have virustotal results we have custom uber widgets which tell us how many times we've ever seen this file before we have

our dynamic sandbox results we have endpoint agents which tell us you know this file has never hit the fleet and then we even built a custom widget for our email IDs so excuse me this tells us this email came through the email layer we have two recipients that received this this is the time they received it and who would have been from so that's extremely helpful when you're just trying to get an eyeball view of the problem hey I have this you know piece of malware my fleet where did it come from not only that but then we have the PlayBook switch go look up that alert in the elasticsearch and then reach back to

that as three container grabbing that raw email and attaching it to the hit this is extremely for an analyst because then they can look at this this raw email and say was this a fish was this not a fish now you know that that creates a myriad of problems because you could still have malware and not necessarily a fish but that's really important information to understand because that means you know maybe your third-party vendor whoever you're working with isn't trying to fish you but they've been compromised then we can also take user guided actions from phantom so here's a user taking that email this is actually an email that came in through our fishing queue and

they're taking that email and then they're deleting it for this one user we've redacted the user's for our own privacy but in this case they're able to delete the email from that users Inbox we also run our own planned phishing assessments to help raise user awareness and user education this is one we had actually run a while back but in this case we had gone back and now we have the ability to programmatically tag all of these emails that came through our IDs with phishing labels this is really nice because this actually moves it out of the users inbox into another separate email container labeled phishing so let's say the ID has false positive we're not believing people's email but

we're at least containing it and letting them know we programmatically detected this maybe a phishing attack not only do we have these alerts in our analyst console but we need to receive them real time my team is a heavy user of chat opps we really kind of believe in that methodology we live in our chat consoles but then we also do a lot of analysis and programmatic actions right from chat so what we're seeing here is phantom actually alerted our chat that we got a display name fish for those of you that are unaware display name fishes are when somebody registers a legitimate common email address such as Gmail or hotmail but then they set their username or

their common name as an executive or somebody in your company that's really important and then they just write a social engineer somebody into giving them some this is an extremely powerful tactic because it's hard to identify because it relies on just social engineering tactics so in this case we're in chat we get this high high severity alert and we can instantly just click the alert it launches our console and then boom there's the email attached right to the alert so now I just went from chat to triaging this in seconds and I can quickly tell this is a false positive so that was easy to whitelist but this is actually an example of a true positive

so here we have somebody impersonating Travis kalanick who is the CEO of uber in the news all the time very popular guy and here's a fake email address somebody registered for him and then they're emailing somebody in the company they're saying hey can you provide those w-2s in this case we were able to respond so real-time and rapidly to this that we were able to call the recipient of this email within minutes of them receiving it not only that we delete it but then we just made sure they didn't act on this so those are some of the things and how we respond to this but then we also see fish that we haven't planned for

that we don't have rules for in our IDs immediately and these often get reported to us by just savvy engineers savvy people like yourself working in the company let's say you know I think you guys should be aware of this so what we do there is we go and we review these you know every few weeks and we make sure that we can then see these in the ideas moving forward this is you know your classic hunting and then writing signatures your bread and butter of really detection in response so here we had an email that came in through our fishing queue and this was really easy to write a rule on because our email ids

already parses all of this information so we can already like programmatically say oh the hyperlink in that Dom right so here we have a HTML hyperlink and the actual link itself points to a different URL than the display of the hyperlink all you have to do is check the protocol make sure it's a full URL and it's a different domain and that's a really easy alert that you can prevent your people from ever seeing this is another one where we went back and we hunted and we have this Google Apps layer and we're like you know this this Gmail layer should really be preventing us from receiving all this spam and stuff but we ended up getting

these messages anyway which still contained these phishing links so we then added the parsing of these hello servers and these received by servers to our IDs and then we were able to harness all of these other intelligence services that we're already using across the organization and reuse that value in our IDs just by parsing another field of the protocol so again we had this you know a few hundred dollar there are a few hundred thousand dollar expensive blinky box and we got so much more power out of just rewriting these tools and parsing in the basic structures ourselves and then just reusing these other intelligence services we already had versus having all this siloed equipment

we also set up some advanced intelligence services so what you're seeing is an example here where we were able to basically detect people standing up new phishing sites against us and then we could actually see the victims falling for these sites near real-time this is pretty huge because it lets us get in front of 8002 accounts we can understand when our victims have had their credential or when are the victims of our our application have had their credentials stolen but before the accounts are actually taken over which is huge from like a risk in a fraud prevention point of view it's also allowed us to attack the Fisher's in a way so here we see an example of a

website which was running WordPress and then was compromised we could then brute forts the extensions on this one part of the URI and we're actually able to get the fishing kit itself once we got the fishing kit from these sites we were able to diab you skate those and pull out who was receiving all of these this phishing intelligence and then not only were we able to definite that email that domain all that stuff submitted for takedown but then we could also look up those people in our own system quantify potentially how much they have gained and then work with investigations to take those people to law enforcement so a quick review basically if if you take the extra time

to parse the protocols yourself and write the alerts yourself you suddenly get much more value because you're able to to write these rules on the fly versus just trusting what some box tells you is bad which is awesome because then we can stay very current on attacker trends and attack our technologies and we can add new rules to catch those campaigns in the wild as they're targeting us also it lets us reuse value so I was talking about those intelligence services we have an Intel arm to our organization and they already pay for a lot of these services so rather than paying you know another 100 grand for one of these boxes we're able to reuse all that value we already have

and apply it to our our IDs and then the integrations is massive having the ability to go back and and white label stuff or delete it is huge as well as just see did it come through this vector is this how I was compromised and that's that's it any questions [Applause] yeah the presentations are going to get released later this week and then we're also planning on open sourcing this project there's two major roads milestones until we can open source this right now we're working on decoupling the rules from the actual infrastructure so then people can also share email and fishing rules for current campaigns which will be nice and then we also want to terraform the whole thing much like

stream alert so it's easy to deploy for anybody else question right the question is how do we store sensitive data I can't go into a hundred percent of our details but we have numerous controls we've had the controls reviewed by sister groups in the organization and I can tell you that no human like I can't go access any raw email only the programmatic systems can do that and then attach it to the things so I don't even have that access and not only did I take that access away from myself and like the whole team but we wrote alerts on the controls for people to get that access so if anybody would have changed those controls we would be alerted

real-time right so in in that regard do we actually strip some of the content from the email so that the analyst only gets some of the content the analyst is only getting the content relevant to the alert so they're only receiving that one email relevant to the alert and then on top of that like I was saying those can only be accessed programmatically from automated systems

the question was how do we deal with an analyst coming across the sensitive details in the specific email I haven't necessarily seen that exact case but usually in those cases we whitelist whatever that content was so we will never see it again we do that with almost all of our false positives just as a good sense of ideas hygiene so that way it doesn't keep showing up and we don't have a lot of noise question right now we're doing ourselves and it's very simple Bayesian classifiers it's nothing advanced but that is something that I would like again like I just want to be able to parse the bodies and then hopefully look towards a third

party that specializes in that kind of stuff like again with the URLs we don't write so many alerts on the URLs but we look towards vendors that specialize in like ad hominem domains or newly registered domains so right now that stuff's kind of proof of concept but I'll be looking towards integrations in the future another question I'm not a hundred percent where the slides will be posted online but I'll work with b-sides to make sure that happens thank you guys I'm going to take the rest of the questions offline [Applause]

thank you Dan on behalf of b-sides SF and Fitbit we'd like to present you with this gift thanks so much for your contribution and to everybody please feel free to submit your feedback and questions online thanks