Grinding Phishing into Detections

Name: Grinding Phishing into Detections
Uploaded: 2022-12-01
Duration: 57 min 3 s
Description: Credential phishing seems like it has always been around, but it really exploded a number of years ago when exploit kits went away and ransomware and maldocs took over. Over the years since, security analysts have been writing detections and watching the landscape shift into what it is today: A big

BSides Boulder · 202257:0322 viewsPublished 2022-12Watch on YouTube ↗

Tags

CategoryTechnical

StyleTalk

About this talk

Credential phishing seems like it has always been around, but it really exploded a number of years ago when exploit kits went away and ransomware and maldocs took over. Over the years since, security analysts have been writing detections and watching the landscape shift into what it is today: A big ol mess of javascript, abused services, redirections and varying levels of “sophistication”. This talk will not be a phishing 101 conversation, but rather demonstrate various aspects of detecting modern credential phish with popular open source tools such as yara, clamav, and suricata, with a healthy dose of regex.

Show transcript [en]

Shaking all the . Making sure that it's all fair. Make sure that nobody sees me or just like sticking my fingers just to mine.

Just kidding. I'll give you just a few more seconds for .

Go ahead and start getting back in here and getting started. A couple things before we do though, I do wanna just give a reminder that we can only have water in water bottles in here. So like all food or drink has to be outside of here. So please just water bottles. in here because we like this space and we want them to allow us to come back next year.

If you need a water bottle, there are water bottles out on the desk or on the table. Thank you. Also something that I was very excited about that I forgot to mention at the opening, your badges are plantable. You can plant them and they will grow wildflowers. So, If you do that, I would love to see those pictures. So that's all that announcement. You can, I guess, tweet B-Sides Boulder and I will get to see some of those. So I would appreciate those. That makes me excited. And then finally, another giveaway. We have two giveaways and I'm going to make drone this time.

pick two tickets or one at a time. We'll do one at a time. So these ones are for the offensive security

fundamentals subscription. That's what it was. You probably can't stick your hand all the way in there. So I don't know about that. This way if he doesn't pick your number, you don't get mad at me. Thank you. Let's see, 2338247.

This guy.

And then I will get your email address. And

again, for another offensive security fundamentals subscription. There's like a bigger word for that, but I'll show what that is later. Let's see, 2338269.

See, this is the benefit of coming back early from lunch. Nobody? Nobody? All right, next. Thank you.

2338291. Yes, we got it. All right. I will get your email address, I will get your email address, and we will get those out to you. And those are the only things I have, so. Take it over. All right. Cool. So our first speaker for the afternoon is Jason Williams. So Jason is Principal Security Researcher at Twinwave where he works on fishing detection and he'll be presenting grinding fish into detections. Please give him a warm welcome. Thanks everybody. Also thank you B-Sides for having me and also thank you for that fantastic lunch. I hadn't had that sandwich before and now I have. All right, so today we're gonna be talking about grinding phish into detections and for the purposes of this

talk, what that means is credential phish and everything that happens after your user clicks on the thing they shouldn't click on. Not everybody is into this sort of thing, but I thought it would be a great idea for a talk because it's a great opportunity to learn some new open source detection languages. There are a ton of phish out there. There are a ton of phish source kits out there. There are a ton of web pages that are live and URLs that are live. There is no end to the subject matter you can find to practice spreading detections. And there are not a lot of people doing this. So it's a great opportunity to learn

something new, I think. This will be a bit of an info heavy talk, but it's not going to be super, super comprehensive. We're going to go through a lot of different modern tactics and how to detect them. But that could easily be a two day training. If you want to grab all these slides, feel free to scan this QR and grab that. And I'll just give you a second to go there.

Yeah, that's a link to Jeff Goldblum laughing for 10 hours. If you really want that. But I actually do have a link at the end that has all of this stuff. So there will be a bunch of links on the slides. Don't worry about taking pictures of the things that it's all going to be, it's all on my GitHub. So yeah, it's there, which is not a phish page. So just a little bit about me. I spent a lot of years writing rules for the emerging threats IDS rule set, both on the open and pro side, and I worked in the SOC for a great number of many of years doing malware analysis, incident response,

and forensics. I got interested in fish because there was an opportunity when the EK's kind of dried up, and we didn't have a lot of great detections around social engineering, including fish kits. So I built a bunch of anti-fish things, And I get to teach a signature development class with Jack Mott, who talked earlier. We haven't gotten it done for the last three years because of COVID and whatnot, but we're really excited to kind of start spinning that up again. That's two days where we just teach how to write rules for all the bad stuff that's out there. And I am currently, I'm a principal fish puncher at Twinwave. When a fish is found, a

lot of times I see people say, hey, I found a fish. They'll make a tweet, they'll put a screenshot up maybe, they'll tag like Hostinger or one of the many service providers that's hosting all of these things and they'll kind of move on. We can do a lot more. There is a great opportunity here to kind of share with the community because as we'll get into this in a little bit, fish is a template game. And we're not really gonna get into takedown because that process is kind of terrible for everyone involved. Even the people that you're reporting the takedown to may not be their primary job, may not be something that they know what

they're supposed to do. It's quite a, it can be quite a futile effort sometimes. But so why would we spend time to detect fish? Fishing is a game of templates and it has been since the early 2000s. Repeated tactics, techniques and procedures over and over and over again. If you write one decent detection, you can detect thousands and thousands of fish going forward in the future. We know this because I wrote ET rules in 2015 that still hit today. And contributing what you're seeing to some sort of central repository, the emerging threats list, some sort of community will make things better for others because they don't have to go ahead and write those signatures. So how do we go about this? So number one, you need

lasagna. In the detection game, there is no one thing that is going to detect everything. You need image scanning, you need hash detections, you need source string detections, you need all this kind of stuff. Because a miss on one aspect of the thing that you're trying to detect shouldn't mean that you lose all of your detections. I like to kind of quantify this by saying I write a lot of nanobot signatures, not the malware nanobots, but like little teeny tiny things that'll add up into one big giant little monster of detections. so that when this thing comes through, I can see exactly what happened. I can see this big story. Don't expect to always write one killer rule that's gonna destroy everything. Write lots of little

rules, because it's way easier and you're gonna have much more fun. Today's detection platforms, we're gonna talk about Suricata, Yara, ClimateV, and my favorite regular expressions. So just really quickly about these technologies, if you haven't heard about them before. Suricata looks at network traffic. You can deploy it on your network as an appliance, it's open source or some sort of application in the cloud. In my opinion, this is much better than Snort for writing detections and for working with network traffic. Suricata is much more than an IDS, it's really a network security monitoring system. So it's gonna generate a ton of metadata about everything that's happening on your network, giving you a lot of tools

you need to tell the story about what's going on. and you can easily write your own rules or import them from the community. We're also gonna get a little bit into Yara. If you haven't heard of Yara before, it's great for detecting on things within files. It's extremely straightforward to write rules. It's a very accessible language, and you can simply point it at a file or folder or use some of these various tools that other security researchers have written to get the scans done. We're gonna talk about Claim A.B. Claim A.B. is a personal favorite of mine. You detect all things in files just like Yara or Claim.mev can unroll archives and find things within those archives. It's not as straightforward as writing Yara rules. We'll get into exactly

kind of how that works. And there are more useful features in Claim, especially today, for this type of work that we're doing. My favorite, regular expressions. So regex just means pattern matching. And this is just useful everywhere. There are a lot of different flavors of regex. So if you're using something like Splunk, you may have a certain query language that you have to use that's a little bit different than something like Elastic or something like Cicrucori, Kata, or Yara. I think that being able to write regular expressions is one of the best foundational skills you can have as a detection analyst or just really anyone working in information security. It used to be that we

would tell people do not write your entire rules in regex. because it was a bit slow. And depending upon who wrote it and how they wrote it, it could really consume a lot of system stuff. But RE2, the word escapements, resources would be the word that I was looking for. But I got confused with red X. RE2 from Google is extremely fast. And we've been able to kind of utilize a lot of this in our different systems. There's a bunch of different applications of that. And we're big fans. So the areas of discussion today, we're gonna talk about how to detect where the page lives, how to detect attributes about the page, the behavior of the page, the resources that the page loads

up. There's a lot of things you can do detection wise before you actually even get to the page that can help alert you to something going on. We can write detections on what the page looks like. We can write detections based on the Fishkit source and the network communications. So jumping right in. Taking a look at this domain, this is certainly not the real Chase domain, at least the beginning is, but then you have this login Microsoft with an X, onlinewebdl.ga. This has been a tactic that we've seen used for a number of years, and it's especially successful on mobile devices, as the rest of that URL and that host name will basically be off the screen when the user looks at it in their

browser. So start off with something very simple like a regex. We would have the literal string of Chase, We would escape the dot as in regular expressions the dot has special properties it can mean anything. Literal string of com. Again, we'll escape that dot. And then we can save dot 20 or more times. And a great tool to test this out is regex 101. Just a website you can go to to kind of test out your little regexes and whatnot. I use it all the time for little stuff like this. There's a lot of different flavors. There's a lot of nice little tooling that you can do to like if you're using match groups, it'll colorize and whatnot. But this is a great way just to kind of

test your stuff out. So this will be useful if you're using something like Spelunk or Elastic or you're grumping through your Apache logs or something like that and you're looking for, or maybe your Bluecoat logs, you're looking for an interesting domain. And this would work for any Bank of America, M&T Bank. We see this used a lot for financial So has anyone seen the Suricata rule before? Not yet, not yet. All right, let's move on. Just kidding. So we're going to go through this really quickly just to kind of make sure that the general structure of the rule makes sense. So first we have what we want to do with this rule. And in this case, we want to alert, which says just write

it out to a log somewhere. Then we define the protocol. This rule is written for the HTTP protocol. There are lots of other ones you can write, but for our purposes, this is going to be HTTP traffic. Then you can define the directionality for this rule. When you configure your Suricata sensor, you define your home net as the thing you want to protect. So the network that you say, hey, this is where all my users are. I want to make sure that anything going out or coming in is monitored at this point. External net is typically anything that's not home net. And then we say any, any for the ports because HTTP can run on

any port. It doesn't have to run on port 80, it doesn't have to run on 443. We have parentheses to kind of wrap the rest of the rule. Then we have a totally arbitrary message that just gives you an idea of what this rule detects. So you can choose what goes in here. You can make it as complicated or as not complicated as you want it to be. I would recommend that you add some describing things in here as if you just put something like Trojan.generic in there. When that fires off, you'll be like, great. I don't know what that means exactly. Then we have kind of a flow statement, which can help sort of

understand which side of the traffic it wants to look at. So in this case, we say, hey, look in the side of the traffic that's going to the server and only in established connections. That's like a TCP established connection. Then if we're detecting on like chase.com, this could be a noisy rule if you actually hit a fish. And I don't want this to blow up my logs necessarily. So I'm gonna use something called a threshold, which can say, hey, only tell me about this one time per source host for 30 seconds. And then you can tell me again after 30 seconds. So this is a way you can kind of cut down on some of that noise to make things a bit easier for yourself. Then

we can get into the HTTP buffers here. So we can say we're looking for a get request. And we're looking for the host that has chase.com in it. Now we don't want to just say chase.com because that's going to fire off on all sorts of stuff, especially if we're breaking SSL or TLS. We're going to say, hey, make sure that there is data 20 bytes after chase.com. So that typically doesn't occur unless we're looking at some sort of social engineering thing. And then we say fast pattern because we're telling Suricata that this is the most unique thing in my rule.

After that we have some metadata just saying that, hey, this is the kind of the class type of this rule. Certain properties in Suricata get applied to this rule when you give it certain class types. Then you have a signature identifier and a revision just saying, hey, here's how many times this rule has been updated. So you take all that, you smoosh it all together, and you get this. You can share it to communities like Emerging Threats or Snort mailing list, and everybody wins. This is a rule from 2016 that is still in the rule set and still hits pretty well, as the initial example was from a couple days ago. So does anyone recognize this domain? IC0.app?

Me neither. I hadn't heard of it until I looked it up. But when you hit this page, we get this weird kind of installing internet computer validating service worker. When I saw this, I was like, this is terrible. But it's actually not. after this page it would lead to this OneDrive fish. This is actually a blockchain web three sort of technology called internet computer. And you can actually, this is a great method for hosting fish. One of the things that we've found out, if you have one of these domains and you try to access it with a security tool, like some sort of sandbox, and you get stuck on that validating worker thing, you can just add this raw subdomain right before the IC0 and it bypasses that. So

if you ever run into that, that's just a little trick to kind of get you through there. So what we want to do here, maybe we don't have, we're not breaking TLS in our network or we don't have the ability to see that sort of thing, but we do see DNS. We can still kind of alert on this activity with what we call a hunting rule. Now a hunting rule is not going to exist in like your malware rule set or your phishing rule set. This is just something that you as an analyst say I'm interested in this, it's not gonna generate a ticket for me. I might not spend a ton of time on

it, but if I see this hunting rule in combination with an informational rule in combination with a malware rule, now I'm interested, right? So you're just building context around the data and the traffic that's going on your network. This is really easy. So we just take the flag of, or the buffer of dns.query. We say that that buffer ends with ico.app and we're good to go. So you can apply thresholds on this. You could do other stuff too. But this is just a real basic rule that can kind of give you an idea of what's going on. And we see technologies like this pop up all the time. But we always see that they take

what they were using before and just put it in this new box. Like there's always a new box. Like they take this OneDrive fish that we've seen for 10 years and they put it in this new box or maybe not 10 years but a long time. Let's move on to page attributes. So does anyone know what FishKit development tool leaves these artifacts behind? So like this was if somebody was cloning a Google page. So browsers make great FishKit development tools. And these are artifacts that are left behind by Chromium. So this saved from, if you right click on the page and save as HTML, you get this artifact at the very top. This saved from UR equals decimal

representation of the number of bytes that follow. And then you have this, any scripts or images will be appended with, or prepended with this underscore files thing. Conversely, if you see one that doesn't have the dot, that's actually by Firefox. So knowing these sort of things can kind of give you more insight into kind of what's going on. And being able to write a rule, I don't have one on the slide here, This underscore file slash is just a great thing to look for. You can look for it in combination with other things like script tags or whatever. That will give you an idea that you might be looking at a clone site. So why even have a phishkit? Why even go through all the trouble of buying

or downloading or ripping a kid off of the Internet? If you're trying to impersonate a brand, just save the page and edit the form. We see this all the time. Man in the Middle apps kind of fall in this category, but they present different challenges. There's always ways to kind of detect those as well, but they can be more interesting. We're not gonna get into those in this talk. But here we have a nice looking MetaMask page where we would maybe want to unlock our wallet. But if we took a look at the page source down at the bottom here, we can see that there's a mirrored from statement for the HTTrack website copier tool. This is an extremely common tool that we see deployed or used in

this sort of manner all the time because it's literally made to copy websites. If you notice here, they're not even copying it from MetaMask, it's copied from a different phish page, which is also interesting, websites.re. And then you get a nice timestamp of how old this page might be, all useful data. So how would we rate detections on this? We could make a very simple regex And we'll introduce some flags here. So we can say something like, we can introduce some flags and say S would mean that a dot can equal anything including new lines. And I saying that hey, this can be in any case. And we can say hey, look for this comment starting with any number of spaces zero or more

times, mirrored from anything zero or more times, and then this by HTTRACK website copy. A very simple sort of regex that you could make. to detect this. We could also utilize a Suricata signature for this. So we could say alert on HTTP when data is coming in from the external network to my home network. So this would mean like I'm looking for things that are in the HTML source and just look for the mirrored from comment. We put a tag of no case on there in case that case is upper or lower and then HTTrack website copy or slash We use a kind of modifier of distance zero to just indicate that this comes after the mirrored from content that we previously defined.

This is a great SIG. We can also use something like Yara. We're gonna get more into Yara rules, but this is just a very, very simple example. We could say that we're gonna define these three strings, and then with our Yara, we can define a condition that say we definitely want We need this mirrored from to be in our page. But we'll take either one of these, either this XRNCO tag or the HTTRACK website copier, because maybe, you know, we see this all the time that fish connectors will take tools that other people have written and put their own names on it, if you could believe that. I know, it's not a joke, it's so

serious. But yeah. Getting into obfuscation, interestingly enough, as of about a year ago or so, the overwhelming majority of fish pages that we saw weren't bothering to obfuscate their landings whatsoever. And this wasn't really the case historically. I wrote a white paper for Proofpoint in 2016 that kind of detailed all of these different common obfuscations that we were seeing in fish kits at the time. But we've been seeing that fishers have been relying much more on redirection and free hosting. Meaning that where you would see just a post to, or where you would see just a fish landing hosted on like a compromised WordPress that had a form PHP, you know, you submit your credentials, it go off to the server

side, it gets sent out to an email address. Now we see stuff like user gets sent an email that has a OneDrive link. They click on that OneDrive link. That OneDrive link goes to Evernote. That Evernote goes to Glitch.me. That Glitch.me goes to like a Bitly shortened link, which then goes to something else, you name it. We've seen like recipe sites abused for phishing, we've seen team building and educational sites abused for their free services. So as we saw this and as we started writing more detections, we started noticing that more obfuscation started to happen because when you put JavaScript in the clear, it's really easy to see what's going on. So this is literally

a tool called JavaScript Obfuscator and it's a very, very great tool. So we can take a snippet like this. This is directly from a phishkit. OU is Outlook, HO is Hotmail, GM is Gmail, and so on. We can take this and encode it to this. And the great thing about this that all of the users of this tool are benefiting from is that there's a GUI at obfuscator.io where you can take some JavaScript, you can just drop it in, define a whole bunch of toggles and how obfuscator do you want this thing, And it will spit this out and they can just copy and paste it right back in. An enterprising individual might notice that

there are still some strings visible down here. But this is like the medium level of the obfuscation. So it gets significantly worse than this. And there are a lot of toggles that I didn't select to kind of show what this looks like. And something that makes this super fun is that this tool is used all over the internet. that's what it's for, it's for obfuscating source code of JavaScript. Used a lot in advertisements, used by companies like EA.

So it's interesting that you have to kind of read through all of these false positives if you're going to do these sorts of detections. And we do have some decoders. I think the most reliable that I've found right now is Synchrony, which has a GUI as well. You can paste it in there. Usually you don't get back exactly the code that was obfuscated in the first place, but you get kind of close most of the time. So yeah, let's talk about decoding this on the junk or detecting this junk junk. So this is the lowest level of obfuscation for this tool. And if you take a look, there's a nice little array of strings at

the top that is not obfuscated whatsoever. So you might be able to just get away with saying, hey, if I see function underscore zero X and logo.clearbit.com, which is a site that you can utilize to pull in a logo for many different brands of websites. We see that used a lot in personalized web phishing. You might be able to get that far. But let's detect the obfuscation itself. So most of the times we're just looking for patterns and typically we have a variable pattern in encoded text. All of the variables are encoded with this underscore zero x and then six hex characters, which we can represent as that regex there. If we look through that and we kind of parse it out, we can see that these

variables come after strings most of the time. So we have function space and then that underscore zero x function paren, var, return, all these different strings. So we can combine this together into a Yara rule. And it would look something like this. So then we can just say, you know, Yara, run this rule file against this HTML. This is running, obviously, just on my computer, but it can be run in an automated fashion as well. And then we'll see this rule fire off. So let's break down this rule a little bit. You start off with the string rule, and then whatever you want to call this rule. Then you have three different sections here, the

first of which being the meta section where you can enter in key value pairs of really anything you want. You can say what the author name was, what the date of the rule was, any references to blogs or anything that had that in there. Then you can say we define a string section with a bunch of different things that we want to kind of pull from. So we can kind of create this bucket of things that we're interested in. And then in the condition, we can go ahead and say we want two of these and two of these and two of these, or three of these and one of these. We can kind of be

creative with how we're matching our detections there. We can also use a regular expression in here. And then we can define a condition. So what we're saying in this rule is any three of these five constants and this regex has to match. So those are the conditions on which this rule would fire. So you can get way fancy with YARA rules. You can do a ton of different regexes and whatnot, but this is just kind of an example of what one might look like. One drawback to using YARA is that YARA doesn't exactly handle whitespace very well unless you're utilizing a lot of regex. And whitespace is kind of the enemy of the person that's detecting anything in HTML. We'll get into that with CLAM.

So let's talk about behaviors. One thing that every phish page needs to do is to send stolen creds somehow. Generally today, from ranking from the most popular to least popular, the way that these are done, typically we see HTML form posts to a PHP script. This has been done for years and years and years. It's just a form tag. Action will equal either a locally hosted or remote PHP file. If you're able to access that PHP file, you'll see that generally it's just a mailer script which will send whatever input it gets out to an email address. Over the past five years or so, we've seen Ajax XML HTTP requests used a lot more. This

is because we're seeing that phishing pages are being hosted on places that don't allow form posts. So they have to utilize something else like JavaScript. We still see rights to local files. This was like the first way the credentials were stored. You would compromise the server, you would put up the phish page, all the credentials and everything that got entered into that page would be written to a local text file and then you would scrape that file if you were the phisher. We still see this used fairly decently. And then we have something like Telegram or Discord, which definitely happens. For whatever reason, I see this on the client side. So their bot token and bot chat ID are being shown in HTML stores, which means you can do

all sorts of stuff with that bot, or you'll see Discord web hooks. So we did talk about, did say that we're gonna do some detections with Claim AV, and let's talk about what that looks like. If we take a look at this form action, we have this dark x slash mainnet.php with a method of post. We don't know necessarily right off the bat, is this something that happens all the time? So what you need to do is have some sort of repository of something you can search through to determine, yes, the string is super common, or no, this string only appeared once in the past year. So we can see that here we have 321

results for this string, so it's probably a good candidate for

a rule. Don't run away. All right, so this is gonna be an interesting, more interesting than a YARA rule, I think. So let's go through how the CLAM-AV rule actually works. You start off with a rule message, call it whatever you like. Usually we see the format of ruleset.description.date utilized here, and you separate your different sections by a semicolon. And then we define the engine. CLAM-AV has lots of different engines. And what we wanna do here is define the rule based on the functions that we know are in these engines. For this rule, this can go back all the way to 0.81 for CLAM. 0.255 is not released. We're just saying that this rule works on this wide array of CLAM

AV engines. We'll see a rule later that utilizes a function that's in a newer one, so we have to define this span differently. CLAMV uses targets and it does interesting things to all of these different targets. So if we say we're looking for a target of zero, that can match on any file. But we can use a target of three in this case, which we did, which will match on normalized HTML. And the way that CLAM normalizes HTML is it strips or it compresses white space. It strips white space from JavaScript. And everything is lower cased and the comments will be stripped, which is unfortunate. because sometimes there's good stuff in those comments you want to write rules on, but you get the benefit of

not having to deal with white space. Then we kind of define what our condition is here. So this is a very simple rule, and we're just saying that, hey, these two things have to be present. But as you kind of write rules and redefine them, you can get very creative with what you want to say. Now zero is this first content here starting with 3C6. The second content of one is the content starting with And what these are, this is just hexadecimal. CLAM needs hex in your content matches. And then there's a colon colon I indicating that this could be in any case. CLAM gives us a cool tool called SIGTOOL, which we can just paste in a rule because like I read hex, but

I don't read read hex. So it's a lot easier if I can just paste it in here and it can tell me exactly what's going on, which is a great tool. Let's take a look at the Ajax form post in JavaScript. So a lot of times we'll see things like this. You've got this big block of JavaScript. At the top, it's grabbing that email value and that password value. And then it's forming it up and it passes it over, utilizing Ajax that's seen on the right. One great way we can write a rule on this is to register a code rule. Since this is coming over the wire to our users, we could definitely write

some detections on this. One thing to keep in mind when you're writing Suricata rules is that Suricata has a definition of how deep into a piece of content it can read. By default, I believe it's 64K, but you can change that to be whatever you want. But just keep in mind that if you're writing rules for a lot of different users and different sensors, if you have a content that you're gonna match on and it's like a mag down, and your sensor is only gonna scan that first little bit, you're not gonna get matches. So this kind of puts together all of the different things that we talked about with the Suricata rules. So we

have, we're matching on file data, which is the, which one we use the HTTP protocol, that's the HTML source. So we're saying any HTML source that contains these Ajax, this Ajax string here, this URL string with PHP 50 bytes after the URL, we've got a little bit creative of how we define that. And yeah, and then we're saying that at the end, The string password should be within 100 bytes of the string data colon. So all of those things together make for a pretty good generic signature. We could define this in Yara in kind of a similar way. So we could just say, hey, let's define two constants. We definitely want to make sure we have password in there because that's the type of role we're looking for.

We definitely want to make sure we have this common Ajax string in there. And then we'll say, sometimes these have the data type and the URL, but not the type. or they might have data, but the type is defined in a different way, like it's a get, not a post. So we want to say, we want to be flexible in our rules when we're writing them so we can do it in this manner. Just a little bit about Telegram. So this is a pretty good looking NetWest fish. If we take a look at the source of the page, we have a whole bunch of interesting information that I don't think should be on the client

side. You got to put the stuff server side. Otherwise people like me who are very lazy and can just look and see exactly what's going on and mess with your bot. All right, so at the top there, they're pulling out a bunch of values like customer number. There's a one-time password field in there that they'll extract out. They're defining the token and chat ID for their Telegram bot right here. If you don't know, there's an open API documents for how you interact with a Telegram bot. You can take this token and you can say, hey, tell me the name of this chat. It'd be okay, cool, here's the name, this telegram user is the administrator.

You can say, okay, cool, tell me the permissions of this telegram user. They'd be like, oh, this is their user ID. You'd be like, oh, okay, cool, tell me about their profile picture. You can get a lot of information just utilizing this token and the raw API docs from telegram. So this stuff shouldn't be here. we define a big message. This is what they're going to send to the Telegram bot. I know we're formatting it up in a nice way that they can ingest it with the one-time password and the mobile and all that kind of stuff. And then they're sending it utilizing XML HTTP requests in JavaScript. So let's take a look at clam and we said that clam normalizes HTML but what

does that look like? That looks like this blob of stuff here. So when you're normalizing JavaScript and Clam, it takes up all the spaces and lower cases everything, which is actually great for detection, maybe not so great for looking at it and presenting it at a conference, but I tried to highlight it there for you. So we're going to take these two interesting fields, the token and chat ID, along with the URL of the Telegram bot. So let's break it down and kind of go through the mindset of creating this rule. So we've got three different fields here we're going to match up. going to break those out into strings and regexes. So we're going

to take the string of token equals double quote and we're going to regex the stuff after it. We're going to take chat ID equals double quote and regex the digits after that. And then we don't need this entire URL in our rule. This is where you can kind of paint your own picture. You could say well I only want api.telegram.org or I only want send message question mark chat ID equals. So you can pick whatever you want here I'm gonna pick the bot URL. So we'll convert these into hex and convert these into regular expressions and that's what these will end up looking like. In CLAM we wrap our regular expressions in the slashes there and then we can utilize this to write our rule. So we have our

name there, put your stuff server side, the date, we have the engine definitions, the target, and we're saying that zero and one and two and three and four. So all of these things have to be present for this rule to fire off. So we have our first content there, which we saw as token equals. We're saying any case, and then we've defined our regex. And in CLAM you have to say where does this regex belong? And you do that by defining the identification for the string that you want to kind of tack this onto. So this is going to come right after the zero item, which is the 746 string. Same thing for the next part. The string starting in 6368 has a regex and that applies to the

item number two. So then if we take this and we utilize SIG tool to print it out and see what it looks like, we get this. And this is much more readable than the hex, but this can kind of hopefully make more sense of how this rule was put together. Just like users reuse passwords, phishers reuse assets. things like images, logos, backgrounds, CSS, JavaScript, the same logo that's hosted on Imgur from this like Microsoft phish or whatever is gonna be in that HTML email that you didn't detect. So utilizing hashes for these resources can be a bit tedious, but it can be extremely rewarding. So an example of this, recently there was a technique that came out called a browser in the

browser attack. I'm waiting for the browser in the browser in the browser attack. But that's detailed out pretty well here and it's a neat little trick. But all of the resources to conduct this sort of attack are of course on a GitHub. We operate with the idea that attackers might, they would never use the resources that were directly on the GitHub. They would create their own images, they would do all the stuff. No, they would not. We're gonna take the path of least resistance and we're gonna reuse exactly those assets. So one thing when these new kind of techniques come out and people put the information up on GitHub, a great thing to do is

just to scroll through that and if there's any files just grab the hashes and throw them in a file for later because who knows when you'll need it. Well that day we needed it. There's a Spanish bank fish that is a browser within a browser and it looks pretty good. You know you have the actual URL up top of the bank and all the information there. But Since we took those resources that were on the GitHub, like a script.js, and the little lock image up there is actually an SVG that was lifted directly from the GitHub because we just threw these hashes in a file and we're like, well, maybe someone will use it. Someone used it. So hashes can be a very valuable thing.

And CLAM-AV is all about hashes. When you say claim AV, a lot of people are like, oh, that thing detects hashes, right? And we're like, yeah, well, it does other stuff too, but cool. So here's just a test HDB, a hash database. We defined an MD5 hash, we give it the size of the file, and then we give it a descriptive name. So fishwave, browser in the browser, script.js. And then we ran it against a sample.gz, so a gzif file. We ran file against it there to show that it's a GZIP compressed data file. And then we said, hey, clam, scan this sample.gz with the test.htp file and clams like, I found your script. So

that's just a neat kind of way of showing that clam can unroll and detect things that are within archives. Getting into the journey to the fish. So a lot of people focus on writing rules about the fish landing itself or about like the code that's underneath or what happens after it. But these days there's a lot of things that happen before you actually get to the credential fishing page where you can write a lot of decent detections that can kind of give you an idea of what's going on before you actually get to that page. So in this we have a little tree where we can see that a user clicked on a OneDrive.live where we loaded up a OneDrive page, cowboy chicken, wood

fire, rotisserie, sounds delicious. I'm gonna click it all day long, because I wanna view those documents. But then sadly enough, it's an Adobe fish. So one thing you can do here, we can define in Suricata kind of like relationships. So if I see this happen a lot, so I see OneDrive redirecting to something like Backblaze v2 or something like glitch.me, one of these commonly abused services, we can write a rule and we can tie these things together using in Suricata what are called XBits. The XBits will kind of keep a host's IP tracking between the source and the destination and allow you to utilize that in your rules. So these are two DNS rules and they're going to work together. So we say, look

for the DNS query of onedrive.live.com. Happens all the time. I don't want you to actually tell me about it. So that's why there's no alert right there. But just save it off in memory. And that's redefined this kind of X bit where it's going to save this thing and be like, hey, I've got this flag set. And that makes it available for other rules to utilize it in there. So this second rule will actually utilize that first bit. So it'll say, all right, now I'm just looking for a DNS query of backblazeb2.com as the end of a domain. And it'll say, if the XBIT is set for TW.OneDrive on the same source IP, now fire this off. So now I've, and this is for like, I put an expiration

of 30 seconds in the XBIT statement. So this says, hey, track for 30 seconds if this same IP then goes to backblazeb2. Not a, 100% rule. It's not going to detect it every time. There's a lot of other things that could happen. But this is a great kind of informational rule that could let you define what's going on. So if we run this on the PCAP that was generated from that sandbox run where we clicked on cowboy chicken and went to the Adobe fish, we see this fire off. And we've decided that we're going to keep a list of all of these commonly abused web services. We have like 380 of them that we see abused for fishing commonly right now.

So we released those publicly and I was like, you know what I'm going to do, I'm going to write Suricata rules for all of these things, all of these relationships. Well, it turns out it's like 143,000 rules that got output from my script and you don't want to load 143,000 DNS rules into your sensor. It just takes a little bit of time to start up. But this is what that kind of looked like. So my idea was to take each of these different commonly abused services. So I've got like 000 web host app, set a X bit on that, and then have every other domain in that 380 domain list check to see if that was happening prior. So like going from 00 web host

to 100 web space, from 00 web host to 123 form builder. It worked in theory, I don't know that I would apply it, but that list is up there if you have any ideas or wanna do any interesting things. All right, so let's get into visual similarities. We're not gonna get into ML stuff. There's plenty of that on GitHub. We're gonna talk about perceptual hashing. And for our purposes, this means a fuzzy hash for an image. The more similar that two hashes are, the more likely that the two are able to be the same image. And we can also do something called calculating a Hamming distance, which says that I want this to be only five bits different or only 10 bits different

than my known hash. So what we can do is we can build a library of known image hashes and then compare the new images that we get our library to create quote unquote rules. Now this should not be the only thing that you do. You should do a lot of other detections. You should use this as one layer of your lasagna that you're building. But this has been used in DFIR for a long time, notably for finding similar looking images to like known things that you might be searching for. And ClamAV actually added support for this in the latest release with some early caveats. So it doesn't support a hamming distance other than zero. So

it has to be visually the exact same image. And you have to create the hashes within ClamAV because different hashing tools can create slightly different values. So we have this, a very typical looking PayPal page that's been fishing us for a decade. What we can do is we can create an image fuzzy hash utilizing ClamScan and generating some JSON based on this file called paypal1.png. And we get that fuzzy hash down at the bottom. We can then take that fuzzy hash and port it into a rule. So we have a ClamAV rule called fishwave.paypal, you know, M65, we have 65 variants of this. And we have to define our engine differently here. So since this tool or this method was just released in the latest engine,

we have to say, If there's an engine prior to 0.150, don't load this role. You don't have the technology to actually run it. And then we say, hey, look for this fuzzy image. So we've SHA-256 hashed these two files, PayPal 1 and PayPal 2, and you can see that they're different hashes. But when we scan PayPal 2 with this fuzzy hash that we created utilizing the perceptual hash, we can see that we got a match. So we can see that these hashes are different underneath they have different bits underneath but visually they are the same which can be a very easy way to kind of start building a library of screenshots or images. Moving into user interactions so generally these days when creds are stolen they're sent over

the network usually via TLS or SSL. We have a lot of campaigns that are coming in through SMS requiring a mobile UA or they might have some sort of service like Killbot which will check to make sure you're coming from a mobile device or it'll just kick you out to Google or something like that. If you can see these things on the networks that you're monitoring, you can detect them. So let's take a look at how we can detect a typical post to a PHP. So now we're looking at screenshots of Wireshark. In the red in the top is the request and the blue in the bottom is the response back from the server. So

we've had Charles Xavier has submitted some information on the top image with his email address and whatnot. When we look at this, we have to say, okay, what are some interesting things we could utilize to build a signature? So we, in the top post there, we have three things that could be useful. We know that this is an HTTP post. So information is leaving the network in the HTTP request body. We also know that there are two folders here called USPS, along with what looks like a USPS tracking number behind it. Now we have one that's in the URI and we have one that's in the referrer. Now we don't know that this is necessarily gonna be a good token to search on, but we'll take a look at

how we can determine that in just a second. We also have this interesting 302 at the bottom, so this is a redirect. And this is a redirect to a location called billing.php. How popular might that be? Could that be useful in a signature? In the bottom here we have a post to some credit card information. And this is fake generated credit card information so don't take it and go on eBay. This is, there are all sorts of iterations of this kind of thing but usually it's a credit card number, an expiration and a CVV. So urlscan.io, if you've never seen it, is a fantastic place to submit URLs and get all sorts of information back about what happens when you open up that page. They also provide a really

great way to search through some aspects of the information here. So what we can do is we said, hey, we saw that slash USPS. Is this commonly abused? So we can look for that. And then we can say, OK, yeah, we see there's some anti-bots. There's anti-2, anti-4. There's ID right there. We've got like a verification page from obviously not a USPS email address or domain. You know, this one here, this unlock on, which looks pretty fishy. That might be a good way, might be a good content to utilize in our detection. We could also do the same thing for billing.php. How often does this show up? And we can search in URL scan for this. And SMRTP to RU is

like temporary hosting, which has a lot of abuse on it. Same thing for SWTest.ru. So these could be some good starting points for our detections. So we might write a couple of rules like this. And remember, all your rules don't need to be big mega killer rules right you can create lots of little rules that can that will then form up into one megatron detection. So our first one is just that suspicious post to the USPS folder very straightforward looking for a HTTP method of post URI of USPS and that USPS also in the refer. The second one something that I know is in the ET set and has been very good at generating some

generic detections is this suspicious 302 redirects to billing.php Very simple, just saying, hey, a stat code of 302 with a location of billing.php. You're gonna get a lot of mileage out of that. And then finally, at the bottom, we have the credit card information that was contained in the post. So we say, if you look down here, we have like CCNUM with a B, and then there's N-EXPR and CVVZ. We don't necessarily wanna match on those entire fields. Let's just match on the things that are commonly going to show up. So like CCNUM or you could do CNUM as one of your contents and then AntEXP following the CCNUM and then AntCVV following that. Those three things have detected tons and tons and tons of

credit card phishing. Getting into FishKit source. So finding FishKit source is extremely easy. You can go on GitHub and find a whole bunch of repositories or you can kind of monitor the links as they come into your network and start to kind of look for the things that you want to find. There are two main tactics to doing this, and both of them take advantage of the common workflow that somebody deploying a phishkit typically does. So they'll typically, you know, they'll get a phishkit, it'll be in a zip file, they'll unzip it, they'll modify the files that they need to, they'll zip it back up, upload it to the compromised website, unzip it, and start doing their stuff, and you

kind of hope that they forget to remove that zip file. seen phishing readme's where they actually physically say like do not forget to remove the zip but everyone does so they are there so one way you can do this is by looking parsing back through a url and making requests looking for open directories you can also just append.zip to all of the folders in that url and try to see if you can just acquire a zip file and this works a surprising amount of the time There are tools on GitHub that do this sort of thing, such as Stockfish, but if you want to build your own thing, definitely recommend it. Knowing what's happening on

the server side helps you understand exactly what's going on, kind of like doing behavioral analysis on a piece of malware, like you run it in the sandbox, you see that it does this thing, okay, cool, but if you actually reverse that piece of malware, you understand how everything works. And it's extremely valuable, so you know what kind of countermeasures are in place. So are they using a service like KillBot? Are they using some sort of archaic IP lists to block visitors. That stuff shows up all the time. How is the data being stored and transmitted? Where is the text file located? Who worked on this fish kit last or who may have created it? And importantly, how old is this fish kit? One thing I love

about zip files is it preserves ZIP files preserve timestamps. So you can say, hey, show me the time delta between the oldest and the newest file in this fish kit. If that delta is super wide and the delta of the oldest file is not like some sort of JS query or something like that, then you might know that this is a very old fish kit. If the time delta is very small, it might be something new. Many of the files within those zip files can be hashed for detections on the thing that you're monitoring. So a lot of times you'll get images that are included within the archive or scripts or HTML, and you can

just hash those and you'll see them on the wire as they come across. Sometimes these might be legitimate files. Like if you see a Microsoft phish that's got a Microsoft logo in there, you write a hash on that, but that is exactly the same hash as the logo from the actual Microsoft page. You need a way to kind of account for that. And one thing you can do is by building a repository of all this FishKit source so that you can say, hey, what's the hash of this? Search for it, show me how often this shows up in the fishing landscape. And so this one shows up a bunch, not every day, but often. And

just like we wrote hash rolls for the images before, we can take these files from the FishKit source and utilize them in our detections when we're not looking at FishKit source, when we're actually looking at things that are coming across our network. So that was a lot of stuff. But just a quick recap, we talked about kind of where the page lives and how you can write detections on domains. We talked about writing detections on the attributes of the page and how it commonly tries to hide JavaScript right now. We talked about the different behaviors the page might do and how we can write detections on those as well as the resources of the page.

Remember that a lot of detections can be made before you actually even get to the fish page. And then when you get to the fish page, you can write visual perceptual hash rules on what that page looks like. There's always the network communications and by accumulating fish kit source, you can definitely get a lot of great detections out of that. So if you're still awake and you say, well, I'd like to get some of these resources. These are some excellent repositories and data points. Fishing Kit Tracker on GitHub has tons and tons and tons and tons of fish kits and it's updated almost every day. Open fish provides a free community feed that's updated every 12 hours. Twinwave posts comments on virus total

for the things that we've analyzed from the public space. So we can say that, hey, this we've tagged with the comment saying, this is Twinwave detected this is fish. We detected it as USPS and we detected it as AK-47 actor. We utilize actor names as the actors call themselves. And then there's also the ET open fishing roles. That's a great starting point of Suricata is exciting. Here's a link not to Jeff Goldblum, but you can grab this one if the copy of the slides are up there and a copy of a text file that has all of the links that were in this presentation. So you don't have to go through the PDF and find

the thing that you, the one thing you wanted to look at.

I think we're out of time, but thank you. Thank you so much. Cool. Yeah, if anybody has questions, just find me. I'll be hanging around. Thank you. I think he's not gonna stay out of here. Sorry about that.

Grinding Phishing into Detections

Related talks