BECShield — a custom LLM Model for detecting Business Email Compromise

Name: BECShield — a custom LLM Model for detecting Business Email Compromise
Uploaded: 2025-05-15
Duration: 31 min 36 s
Description: Think a hacked email is no big deal? 🤔📧 Think again! What if it was your business email? 💼🚨 The impact could be massive — financial loss, data leaks & more! 💣💸📉 Binil Kuriachan & Renuka Talegaon took the spotlight at BSides Ahmedabad 🎤🌟 to present BECShield — a custom LLM Model for detecting Busine

BSides Ahmedabad · 202531:36250 viewsPublished 2025-05Watch on YouTube ↗

Speakers

Binil Kuriachan Renuka Talegaon

Tags

CategoryTechnical

StyleTalk

Mentioned in this talk

Platforms

Azure Machine Learning

Frameworks

BERT PyTorch

About this talk

Think a hacked email is no big deal? 🤔📧 Think again! What if it was your business email? 💼🚨 The impact could be massive — financial loss, data leaks & more! 💣💸📉 Binil Kuriachan & Renuka Talegaon took the spotlight at BSides Ahmedabad 🎤🌟 to present BECShield — a custom LLM Model for detecting Business Email Compromise (BEC) 🧠🤖🛡️ Here's what they covered: 🔍 Overview of BEC 🎯 Common attack types ❗ Problem statement & smart solution approach ⚙️ Inside look at BECShield's architecture 🧠 How it classifies and detects threats 📈 Key performance metrics 🚀 Future roadmap 📚 Bonus: Helpful references

Show transcript [en]

Hi good afternoon. So today we will be talking about be shield a custom LLM model for business email compromise detection. Yeah, thank you. So yeah, first we will introduce about ourselves. So myself Bin Kashan, I work as a senior app scientist at Microsoft. Uh so my expert is relies in machine learning especially large scale modeling and I'm relatively new to cyber security. Now my current focus is like modernize the MD ML stack. Now MDO stands for the Microsoft Defender for Office and I focus on email email fishing attacks and this work we'll be talking about the BC. Yeah, thank you Benel. Uh good afternoon everybody. My name is Reuka. I am based out of Bangalore, Karnataka. Uh I have

six plus years of experience in security industry currently working with Microsoft. My areas of interest include uh email research. Um some of my time I've also spent in incident response and endpoint security but uh predominantly it's email research. If you ask me a day in the life of a researcher uh I would have to say that I work for uh the Microsoft defender tool for office. Uh usually my work uh revolves around writing detections uh improving the security measures and chasing the new threat actors in email. So that was about me. Let's get started and Benel will join us in the middle. So let's go through the agenda today. We will start our introduction with the fishing emails. I'm sure the

whole crowd is of cyber security. So everybody would be familiar with what is fishing. But we would also go deep down in business email compromise. What exactly is business email compromise? some of the types of business email compromise that we see and why is it difficult for us to chase business email compromise emails. Why are they special that it fails our detection stack? Next is the solution. We will see what exactly is begshield and uh we will see how begshield classifies the business email compromise emails and then we will see some of the runtime samples and metrics and future scope of our be shield. So let's get going. Also I want you to excuse me for

my bad throat. So as we all know fishing is a cyber attack wherein the attackers tend to send out uh emails to the victims and the victims are lured to click on the fishing emails. May it be emails, links, attachments, whatever it is, the victims are alert and in turn the attacker gets the credential uh harvesting uh uh such as passwords, usernames. Sometimes it's not just to the usernames and password but it also goes beyond wherein he catches the personally identifiable information such as your banking account names, account numbers, credit card information etc. So it's very clear to say that fishing remains the top uh threat for any cyber crime and may you see any of the attacks

such as malware, ransomware or any of the HTML related fishing campaigns the first thing that to go to is the fishing emails. The attackers start their attack from a fishing email because that's the point of delivery for them the attacks. So next thing is the Forbes also reported that the there was a $500 million loss due to fishing but then which also increased by 179% in the year of 2023. Not just this according to FBI the business email compromise contributed around $2.7 billion which also increased by 10% in 2021. We will come to business email compromise but first let's see the trends of fishing. So the average cost of any BC attack is 100K which is quite

less but whenever there is an impact so let's say that there is a BC attack and there's an impact which is $2.7 billion but when you compare to a ransomware attack it hardly costs around 34 million which means Beck is almost 79% times more than a ransomware attack. Let's see why exactly we have to scope back and we'll understand what exactly is business email compromise. Any idea? Anybody knows what is business email compromise [Music] here? Okay, let's get going. So business email compromise is a cyber attack wherein the attacker is quite uh evident. He wants the username and credentials from the organization because the loss that is incurred should be either an organizational loss, a data

loss or a monetary loss. Now coming to the pattern, how does the attack take place for a bank? The first one is identifying the target. As we all know all the cyber attacks start with the reconnaissance phase. In the first phase the attacker starts identifying the target. If he has to target a particular organization what he shall do is he'll start collecting the information in LinkedIn dark web sometimes because all the data is available in the dark web or maybe some of the websites wherein you see some specific information regarding the employees and start targeting those employees. The next step is the social engineering step. The attacker starts collecting the information. After collecting the information, he's already with his

business email compromise emails. So once he starts sending out those emails, he uses social engineering to interact with the user. Example, based on your advertising trends, attacker knows that you are more fond of gift cards or anything. Maybe he can craft his email based on that and starts interacting with you. In this phase, once the user get the fishing emails, the business email compromise, the information exchange take place. Let me just hint you how the business email compromise emails look like. Example, I am the victim and I might get an email stating that hey, you have received a gift card. Would you wish to reply me? So that's how the attack chain goes on. Another thing

about business email compromise is there are no malicious payloads in the business email compromise. So for for example when we talk about any fishing emails let's just start guessing a fishing email what would you see a malicious link a malicious attachment maybe a malicious looking email as well. So this is what hints to you. But the business email compromise doesn't look anything similar to this. There are no malicious payloads in the business email compromise which makes it even more dangerous. In the information exchange phase, if the victim by mistakenly shares his any of the personally identifiable information such as his phone number, his email address, banking account details or any of the house address, attacker starts using that information.

So the last phase is the transfer phase. Either the users tend to give away the bank banking account information or sometimes give away their personally identifiable information. Now it's left to attacker whether he's going to use it as a data theft or he's going to use it even more harder to make sure that there is some monitoring from it. So this is how a typical business email compromise attack works. Let's see some of the type of business email compromise attacks that we see. First one is the payroll fraud. Payroll fraud is mostly targeted to the HR departments and some of the finance related departments. What the attacker does is he sends out an email stating that there is some payroll uh

update that needs to be done to your account. The moment victim updates the account. The next thing that the attacker does is he diverts all the funds that had to come to the victim's account to the attacker's account. This is one way of business email compromise. The next one, the gift cards. As I said, based on the advertising trend or based on the user behavior, attacker tries to sends out emails to the user stating that there is a gift card available. Or example, a boss can send out an email, but this is all related to the impersonation that is happening here. There's no legit emails involved. the boss might send out an email stating that hey I need some of the gift cards

for my country for my country or for my company so that can you issue some of them or else I already have some of the issued gift card can you redeem them so this is the way of luring the employees and making sure that you're engaging them in the email chain the next is the W2 reports W2 reports are the tax reports so the attacker can do is he can impersonate any of the previous employee or the current employee, send out emails to the finance department stating that hey can you issue the tax reports to me and he can further use those reports modify and use for future attacks. The next is cell phone number.

This is the most common that we see. uh if we have folks working under email security in any of the orgs, you might see that so many times users get these emails stating that hey can I know your cell phone number can we quick chat on WhatsApp. So the attacker's intention here is to message you first get your phone number next take this attack to WhatsApp so that he is evading all the email detections here. So this is one type of business email compromise. And the last one is the action requests. Maybe somebody from the CEO, CFO or the leadership level. The attacker might impersonate any of those. Send out emails stating that hey I have

an urgent request. Can you fulfill this? Hey can you get this done? So as soon as an employee gets to know that this is something from the leadership level, employee tends to reply to them and that's where the compromise happens. So these were some of the types of business email compromise attacks. Let's go through some of the examples that we have seen. The first one is the cell phone number. The attacker is trying to ask the user to share his cell phone number and the second one is related to the gift cards. He says that he wants some gift cards due to which if this can be issued. And the last one if you can see

the email is coming from a free email domain but there will be some impersonation happening and the attacker will be asking for the mobile number. Moving on now we'll see what is the problem statement. Till now we saw what exactly is business email compromise. Why there was a need of a feature here? Why was it supposed to go into a product? We will see what what exactly is the problem statement. We see that business email compromise. Excuse me. So, it contributes around 25% for the overall fishing campaigns. But as I told you, there is no sign of maliciousness in business email compromise. nor there is any malicious links nor you see any malicious macros payloads, HTMLs,

attachments, nothing. The email is quite clean when it comes to you and even you might not understand that this is a business email compromise. So whenever such attacks happen, the attacker is quite sophisticated these days. The first thing that we see is all the emails are originating from Gmail, Hotmail, uh Outlook, Yahoo, etc. So it is said that you can't block all these domains in your organization. Blocking them blocks all the other emails as well. So this is one of the challenge. The second challenge is it targets the attack. So as we see if we working in cyber security, we go on trend analysis. Example, if there is any fishing campaign going on, you will see that

there is millions of trends in the campaign stating that okay, there was a spike of 1 million fishing campaigns, but business email compromise doesn't work this way. The attacker might send only two emails to two employees and compromise them. So, we can't decide the trend here. As I said, lack of malicious payloads. We don't know what the IOC's are for the business email compromise. All that we can rely is on the email pattern. Next is the attack comes from free domains such as Gmail, Yahoo, Outlook etc. So we can't directly go and block those. Instead we should go for some feature which actually sees this and enhances our tool. The last is the lookalike domains domain impersonation

and user impersonation. Say that there was an XYZ user from your organization left the organization but still the attacker is impersonating that particular user that particular domain and sending out the emails to you. So this is very hard to detect from the traditional point of view. So that is the reason we will be moving the moving to the machine learning point of view. So I would give it to Benil to carry on further. Thank you. Yeah. So before seeing the solution approach let's summarize once. So mostly we have established that fishing is still the most or 91% of all the successful attacks that is a distribution and out of fishing business email compromise is one of the major

category the main focus will be like they are not malicious based URLs or attachments. There are no malicious payloads. Mostly the malicious content is the email content itself. Okay. So go to the solution. Now we will be our objective will be like building an advanced advanced machine learning solution to detect BC scams. Uh like I said u we the most important part here is the context. So for example we have seen like some examples of BC emails like gift card or email uh cell phone request all of them. Now some of the key features which can explain or give us the context are email content which means the subject and the body otherwise mail headers as well as the

domain features. So overall these things will give us what an email is talking about and how can we trust the domain or how can we trust the sender with this what we are going to do the we we will be focusing on building a large language model as a solution here before explaining large language model let's talk about what it is mostly you all are familiar with the large language model now uh no go back okay language model is a machine learning model. So machine learning when I say machine learning all of them are probabilistics model. We'll be generating probab probabilities. Language model is where like the machine will try to generate probabilities of the words or the try to

generate some meaningful context about the uh language. For example, if all of you are using WhatsApp so in WhatsApp when you type something there will you'll be getting some predictions. The moment you said I am probably you'll get a prediction saying fine or how are then you will be getting a prediction like how are you then doing etc. So all those are language models. So they will be predicting what is your next word based on your previous context. Now what is large language model? Large language model is something which is again a machine learning model which is trained on huge amount of amount of text data. anyone we all can build language models because we need a small set of data and

that model will only know about the context of that particular data set. Now how can we build a model which has a big context or big knowledge base that is a large language model. So you all must have heard about CH GPT right? So chart GBP is one example of a large language model. We are not using CH GPT but it's just an example. So we will be building our own custom language model or custom large language model. Yeah. So BERT, have you heard about BERT? Anyone heard about BERT? Okay. So it's an example of a language model or a large language model. Initially it was developed by Google. So similarly we have Turing, Lama. Lama is developed by Facebook.

Touring is developed by Microsoft. F3 is there which is again developed by M Microsoft. So this particular solution we will first discuss about bird how that is going to help us. So first we'll we'll say again we we have explain like bird is a kind of language model and how it can be called as a large language model as as a set of model which is trained on a huge amount of our Wikipedia or book corpus. So you can imagine like Wikipedia has a lot of context. So on all of the Wikipedia this model was trained to get to know some task. What is that task is for example next sentence prediction or unmasked m

unmasked language. For example given an input sentence can you predict the next sentence? Now you train this model iteratively on a on this data. After some point of time this model will remain a position to understand the context of multiple words make meaning out of it. How exactly this is making meaning out of it? It's mainly because of the transformer block. Go back. Yeah. So this transformer block that is the brain behind all the language or large language models you have seen be charging 53 etc. So this transformer is just a set of um multi head attention at normalization feed forward layer at normalization. Let's not get into the too much details but this actually helps

to understand what is there in the text. I'll give you an example. Let's think about this sentence. Cat sat on a mat. It's clear right? Cat sat on a mat. Now what do you actually look at the chat GPD or something we will be prompting where was the cat's sitting. So the model should be in a position to explain or give the answer as it's Matt. So they should know cat was sitting on a mat. How the model will learn about this? So they should remember where I should give attention. So the major words were cat was one, sitting was one, matt was another one. So this is how the transformer blocks works. So they will

be understanding the sentence and trying to make meaning out of it. In a way you will be able to generate the next answers or some of those kind of things. Now we will have multiple transformer blocks here. I will give you another relatable exper example. You all are familiar with phase detection. So the phase detection is again is a neural network model will help you detect okay whether this is a face or not. So how exactly what is the neural network model? It's like a trying to remap our brain right it with set of layers. Each set will have different layer of neurons. The first layer what is the significance we have multiple layers. First layer will

try to learn com symbol features. Next layer will use the symbol features learn from the previous layer and build do on top of that. So let's say after five six layers it will try to learn okay this looks like a face. Initial layers must be knowing an edge or after one or two layers they will know this is a nose this is a nice and at the end they will know this is a face. So they will use stacking of different layers to know that in a similar way we will be stacking multiple transformer blocks. Each transformer block is trying to understand or trying to make some meaning out of it. So this is the

typical B architecture. It comes with different layers of transformers. Base means there are tall transformer blocks but large means there are 24 blocks. Okay. So in this one the the key things to key takeaways is like is mainly trained on a large uh set of data and it has a wider context. This can also perform zeros short cross-lingual transfer which we will see later. And why exactly are we going behind or go using any of the pre-trained model for our task? We are not using this as it is. Our assumption is this model knows about multiple words and its context and how a particular word or sentence will be used. Okay, now let's get into our

solution. What is [Music] beckshield? Okay, so be shield is a custom deep language model that uses raw text to classify BC emails. So this is the reason I am saying custom is like this is like is not using any pre-trained model like GPT or BER as it is. We'll have to train it or we'll have to build a custom model for that. We will see that use a bird as a base and use for transfer learning. We will see transfer learning. For example, you all are cyber security experts. So when you when you are trying to fetch a new attack, you are not even though you are starting uh from the scratch, you also you'll utilize all your predefined

domain knowledge, your expertise. So your knowledge will be always useful when you shift to the new attack type. Right? So that is the same thing we are trying to do here. Bert is trained on something. We'll try to transfer that knowledge and make it learn. Okay. Let's focus on BC. Now you try to learn or let's focus on detecting BC attacks. [Music] Yeah. Okay. So this is the back shield. So it's looks very similar to what we have seen in the bird. So till this transformer block this is like a bird and on top of that we will remove what B was doing. Bird was probably like looking for a next sentence prediction or a sentiment sentiment prediction like

like what is the sentiment of this sentence. Instead of that we will try to predict what whether this mail looks like a business email compromise or not. We created three different layers there. Three different dense layers. Those are neural network layers. So we removed the to head from the previous bird and attached three different layers. Now for developing this one again we have used machine learning libraries like PyTorch uh Ashure machine learning all the the entire model was trained on Ashure machine learning using multiple um components of the ML platforms. So I guess like so far it is clear. So um again just to reiterate our model this be shield was again on top of

bird. So we were applying transfer learning. So whatever the context the B model was knowing we'll try to transfer it to our own task which is detecting Bmails. Now once the model is there how it will go and detect email attacks. Assume that we get billions of emails per day. So we cannot classify. So we should be careful in a way like how we are putting this model into the mail for stack. We have the input mail for that is like billions of emails will come. So we will also make a down selection to select what could be the probable BC traffic. You don't need to check each and every mail of this one 15 billion emails per

day or 20 billion emails per day. You should know in a way you can filter down the stack because you cannot run this model on all of them. Just want to make sure like you save the infrastructure and also looking at the most probable BC traffic or most probable malicious traffic. Once the down selection is done, we will again do the data processing. So data processing can be depending on the whatever kind of machine learning model you are building.

Yeah. So once the data processing is done we'll feed that main to the shield model. So which is already ready for prediction. This model is already pre-trained. This is just waiting for the data. So that will give the prediction. The prediction that is again which means whether this is a or this is not a or what kind of it is like we said already discussed about the common types of PC thread or gift card, cell phone number etc. Once this output this is the intent we will apply some policy on top of that prediction. So we will combine bet prediction and the policies to create the final verdict and if the final verdict says this is a bad attack we

will block it immediately. If it is not we will just allow or other other solutions can take into action. Okay. So this we will show see some of the some of the actual uh realtime attacks. Okay. So this was like given to the model and the model was giving the prediction like this is a fake thread. So if you see there are two males the actual the second mail looks like so it's just to give an impression like you we were already in contact or we were already having this email. So that was a fake thread. The second mail they just started the entire mail in a single thread. So this was detected as a fake

thread and the second one. So there is again uh dictated as a gift card and one more example saying this is a payroll fraud. Yeah. And our matrix and future maps. So now when we deploy this model or when we were working on this model we would see very amazing results like with a precision of 94.5 and recall of 92.4. So when I say precision of 94.5 when we say some mail is back this was that much percentage precise. So at the same time we we applied multiple policies all those things to make sure our FP rate we were able to achieve something like 0.000050 which is like we we don't make mistakes even in in 10,000 emails like

one mistake something of that sort with an FP ratio of 0.05 05 percentage. In a future maps, we will we are planning to add more uh relevant features like advanced down selections and also extending to a more wider traffic. Yeah. So in the interest of time like Yeah. So we are uh this is this is it like any any any questions?

Good afternoon sir. My name is Adita Naut and I'm a security search student from Emit University Rajasthan. So basically my question is about the LM model you uh sorry bird model you are using and you have said that uh you use down selection method for the filtering of the males. So can you throw some light on the down selection uh method you use what rules you use and what are the basics you uh implement to filter out? Yeah. So down selection usually it's like combination of machine learning as well as the keyword selection. So for example like this is the place where we can apply all the domain knowledge. So in the domain knowledge in the sense like from where

or which kind of distribution we will get mostly the malicious traffic. So from what kind of domains we get the malicious traffic what kind of keywords or what kind of uh like signals sender reputation or sender signals can usually associate with malicious traffic or the communication state. So those were used for the down selection included with the machine learning model. So sir what is the uh current feature vector size you are uh implementing over this sorry feature vector ve vector size. So this feature vector size will be based on let's say if you have like um a bit tokenizer we'll be using so it will have thousands of features for example like if you have trained this

model on a 1 million data set out of those 1 million data set we can choose the most relevant features like set 20,000 features or 30,000 features according to how we want to define this model. So we will be choosing the or training the model in a way like we will limit it. Okay the take the top 300 30,000 uh feature vector or something of that sort but the feature size we will be looking at the top 500 sentence sorry uh tokens of email and is there a feature selection algorithm you are particularly using yeah we are using the same tokenizer for the tokenization. Okay. Yeah. But we are free to use any of our featurization. So

let's say this is just a starting so tomorrow it could be a touring based featurization or we can build our own featureization featurization vector. Hi uh my name is Bala I'm from Microland um interesting uh approach to business email compromise uh when you mentioned about false positives it is about what your solution inter uh I mean intercept and concluding right whether it's false positive or not. Yeah. Uh sorry I did not understand your question. So when you say false positive, false positive is uh BC mail comes and your solution will be able to identify it. No, false positive comes like when we identify a good email as a back. Okay. Yeah. So mine was the uh the reverse part of it.

What are the chances that the actual B mail could be passed through by your solution and not catching it? Yes. So false native. So in the initial version we have seen some false negatives that is the reason recall was only 90 around 90%age which means like 10% of the bes are still not caught. So that is a room for improvement. So when we say recall out of all the back samples how much how many of them are getting caught. So from the almost 10% of them are even still missed. So that is again like since we version one we can only make those progress in phases. So that's why we thought we will add refine the

architecture and also add more features to make sure we will increase the recall. So reduce the force nives. Got it. I mean there are more question we maybe we'll catch up offline. Yeah sure. Thanks. Any any other questions? Yeah. Otherwise thank you.

BECShield — a custom LLM Model for detecting Business Email Compromise

Related talks