LLMs Suck at Cyber Intel—Unless You Hack Them Right

Name: LLMs Suck at Cyber Intel—Unless You Hack Them Right
Uploaded: 2026-03-02
Duration: 29 min 3 s
Description: Social media platforms like X/Twitter host real-time threat intelligence hidden in the noise—leaked credentials, malware samples, and attack indicators. This talk explores how to operationalize LLMs to extract indicators of compromise and campaign data from security researchers' posts at scale, cove

BSides TLV 202629:0321 viewsPublished 2026-03Watch on YouTube ↗

Speakers

Inga Cherny

Tags

CategoryTechnical

TopicAI Security OSINT Threat Intel

StyleTalk

About this talk

Social media platforms like X/Twitter host real-time threat intelligence hidden in the noise—leaked credentials, malware samples, and attack indicators. This talk explores how to operationalize LLMs to extract indicators of compromise and campaign data from security researchers' posts at scale, covering prompt engineering tricks, hallucination mitigation, few-shot learning, and structured output enforcement that reduce false positives from 57% to 2%.

Show original YouTube description

Every second, 6,000 posts flood X.com—some valuable, most just noise. Hidden in the chaos are leaked credentials, malware sightings, and attack indicators. The challenge? Spotting real threats before they get lost in the noise. This session dives into how AI can turn social media chaos into actionable cyber intel. We’ll track key cybersecurity insiders, extract Indicators of Compromise (IOCs), and put machine learning to the test. But raw AI isn’t enough—off-the-shelf models choke on misinformation and context. The real power comes from hacking LLMs to make them sharper, faster, and more threat-aware. Through real-world data and hands-on analysis, we’ll break down what works, what fails, and how security teams can tweak AI to stay ahead of attackers. No fluff—just hard-hitting lessons on AI’s role in threat intelligence. If you want to see AI in action, with all its strengths and weaknesses laid bare, buckle up. This talk is for you.

Show transcript [en]

Our next speaker, uh, Eningga, she's a security researcher at KO Networks with 10 years of cyber security experience, specializing in threat intelligence and datadriven detection methods, bringing practical expertise in turning enterprise security challenges into innovative solutions. Eningga will be telling us about LLMs and why they are not so good at some tasks because people are always afraid that LLMs will replace all of us. Well, not entirely. Uh, Eningga, thank you. >> Thank you.

>> Quick show of hands. Who here doom scrolls on Twitter? Raise your hand. No way. Only few of you. Okay, if you're not scrolling on Twitter, you should. It's really funny. This is just an example of what you can see there. But jokes aside, there's actual threat intelligence on Twitter and a lot of it. Just look at this example. This is the incident that happened last week with the CVE that got released on React. You probably heard about it that React to Shell. Now everybody is trying to weaponize it and it was on Twitter before it was on any traditional news channel. It was on Twitter. So why is it that today threat intelligence platform still looks like

that? We have a feed platform with vendors like VT, IPQS, IP use IBDB and all of them they collect suspicious domain URL file hashes everything that we should collect and block and that's great. It's really good. We need it. But can we do something that works faster? Like can we actually use social media and add it to our platform? Can we consume IOC's uh like uh file hashes domain and block them immediately and CDEs like the one from last week about reactive shell can we catch it as soon as possible. So hi, I'm Eningga and today I want to talk to you about how we can use LLM to actually you uh to look at Twitter

Twitter ad as a threat intelligence platform. Before I'm going to show you how exactly I did it, how I gathered all the information on Twitter and blocked it, I want to talk to you about what are we looking for. We're looking at uh CNC servers, IPs of CNC server that we can block. And if we are looking at Twitter, it will look like that. We have a tweet of a CNC that is suspect uh IP that is suspected as an CNC and we have tags, malware, threat intel, everything. I will go over the tag later on. If we are looking at fishing domains, those Googles with triple O instead of double O, it looks like that. We have a

trusted wallet fishing the uh domain uh which is the secure asset and we have tags of fishing, crypto scam, all that fun. So we want to collect those and add them to our block list so we actually can block it on real time. But we also have a lot of that. Uh we have act this is a security researcher. He's actually really good. He shares a lot of malares and malware analysis but also a lot of images like this one. And he actually said, "Isn't this a cyber security malware account? All they do is cry about an eye and post pictures of cats." Yes. And yes. So how we go from all the noise on Twitter, all the jokes,

the memes, all of that, how we can collect actual threat intel. We start with finding the relevant users, the week, the keywords, what actually we should follow. And you will surprise to know that it's actually harder to find security researchers that share more threat intel than memes. Next we need to fetch it periodically. This is not enough to catch it once. We actually want to be real time catch every every time you have we have something new. We want to catch it. Next we need to extract the IOC's the TTP the malware the campaign everything that we have there. We need to extract it and to enrich it to get a better picture than just okay you should block

this IP. It's malicious, but we want to add more information to that. Our first mission is to filter who to look for. Let's say we have one user that we want to follow. We know this specific user. He shares interesting stuff and we want to follow. There's actually a lot of academic works uh work that treats uh social media as a network graph. We can look at it as a graph when the users are the nodes and the followers and the comments are the agents and we can traverse the graph to find more users that are interesting to follow. We can do that but it's a little bit problematic. It will take some time, requires resources, research and we will

do it only once but we need to update every week or every time. So we will have the freshest resources to follow. But luckily we have LLMs now. We don't have to do this graph. We can actually ask an LLM who should I follow? And here I'm using Grock, which is particularly good at this because Grock is also indexing Twitter. So I asked Grock, "Hey, I'm a security researcher. I'm looking for CNC's, IPs, of fishing domain, everything that might be interesting for a threat researcher. Who should I follow?" And I got a very long list of all the users that I should follow. when was the last time they updated, what kind of IOC's they share, how relevant they are

to me, should they be my first priority? Like, uh, if I have a limited budget, who should I look for? Uh, so I have this lo uh this list. It looks great, but I have a slightest problem of hallucination. For example, this user is not an actual user. This is just a placeholder because it happens. The model hallucinate. The problem is it responded everything with such high confidence. So how do I know who should I follow or who is hallucinated by the model? And this is the actual user. And if you're looking for threat intel on Twitter, highly recommended a lot of malware analysis, CNC IPs. So the hallucination didn't come from thin air. It was because this is a

similar username. So what can I do is to improve my prompt and force it to use tools. Gro can actually use API for X. So I force the model. Um we can say that basically begging the model to do that. Sometimes it works. Later on I'll show you when does it doesn't work. But here it works. And I got a really long list, even better, in the format that I need with validated users. Same guy from before. He approved it. I still have some hallucination, but now it is so much better and I can just validate it with X API. Next, we have who to follow, what keywords to look for, but now we need to

collect it. So, we use a Twitter API to fetch all the tweets. We start with a baseline. We build a baseline for let's say a week, what happened in the last week. But now after we have the baseline and some image of what's going on, we want to build it incrementally at fetch every 15 minutes, every half an hour, whatever we choose to be. So it will be the closest to real time. So the next time we have some um massive CVE or some exploitation in IP so we can fetch it. And if you're wondering how expensive it is to use twe Twitter API uh X, sorry, I will call it Twitter forever. So, it's

$200 for the basic tier, which gives you around 15,000 tweets a month. This is not a lot. You can also use the pro account, which is more expensive, but if you have the money to spare, go for it. And we have a backdoor. We can have uh there's a lot of scrapers for Twitter on GitHub. You can find them. Um I'm not going to elaborate on scrapers. You probably know why. So I'm not going to say it on recorded session. So I'm inviting you to just explore it. Now we have the users that we want to follow. uh we collected all the data but now we just have a lot of text. So how do we t take this text and converted to

threat intel?

Okay, spoiler alert. You know I'm going to use LLM. This is the title of the conversation. But why why do I need an LLM? This is example of a tweet of a user that I follow and it looks pretty simple. This is just a CNC C2 with an IP, some tags of the malware, the campaign looks like a pretty simple task for a reg. So why do I need an LLM? Well, the same user that I follow that I know he is interesting for me. The same user also shares tweets like this one here. I also have a link. But this is a blog about a fishing attack that if you want to know more about it that you can

go and read. But here, this is just a legitimate link. I don't want to extract it with regex accidentally. If I'll do that, it will be really bad. everyone will just turn off my feature and will never get IOC's to block. So I really needed to read the context of the tweet so I will know if this is a legitimate link that I should allow or something malicious that I need to block. And ideally on a tweet like that, I will have a JSON response with a structure of the malware name, the type, the type of the value that I'm about to block, uh, which is C2, the value, the IP that I want to block, and it will be amazing if

it will also share the sample link so I can go later on and analyze the malware. Unfortunately, it just didn't work. All I wanted is a simple tweet and a response of the threat intel, but it didn't work out of the box. And if you worked with LLM before, you probably know that there's some tricks that you have to do to make it work better. The first problem that I had is a false positive. This is a tweet and I'll give you a second to scan it. Um, do you think do you see anything suspicious here? Raise your hand if you see anything suspicious. Well, there isn't. Nothing here is suspicious. Sometimes it happens that I have tweets

from uh researchers that I follow and they don't have any any information for me. But unfortunately, the model tagged the link to the Twitter uh as malicious and the malware name is Endgate. Well, this is clearly a false positive and it happens 57% of the time, which is a lot. I cannot trust a platform like that. Well, we have to fix it. And this is just some of the links that appeared as malicious. And clearly none of them is malicious. But I can just add that those and then oh if you see one of them that don't report it because this is just the majority. But I have so many more false positive. So what we can do at this point is to

teach the model to be better. We can use few short examples and this is when you actually give the model more examples of this is a tweet and this is what I want to get back. So we need to teach the model what to extract, what not to extract really importantly and to show empty uh tweets like they have tweets but empty responses that in that case we don't want to see anything. That's okay to respond with, I don't know. I didn't find anything. And especially for us, it was really important to use negative examples, even more than positive, because we had so many false positives. We went from 57% of false positives to around 2%.

Which is great. So much better than we had before. But it's still a long tail. So what we can do at this stage, we have two options. We can fine-tune the model, make it even better. Instead of just giving it example, we can rewire the model. So it will be better at this specific task. And the other thing that we need to do and also important is to obviously not to block any IOC's that we see based on Twitter. We can verify by checking how popular the IP. Uh is it something like Google? Don't block Google, for example. Also, we had another problem. This is a tweet with an CNC IP. It looks pretty straightforward and simple to

extract it. But sometimes the model goes a little bit too creative. Instead of just responding with a JSON strct that we need, it actually wraps it in some kind of string or YAML or MD file. Everything just not the string, the JSON structure that we need. And this happens in 10% of the time. And the problem with that, it's not just some false positive that we can fix later on here. uh it just translated to data loss. So every tweet that we get back with incorrect format is a data loss. It's uh IOC's or CVE that we'll never see again. Usually what happens in cases like that that we start begging the model to do

better. Oh please can you respond with a JSON? And if it doesn't work, uh, we say, okay, respond only with with a JSON and do not include anything else. And it even escalates to, okay, if you're not going to give me a JSON, I'm going to shift to a different model. And everybody knows this overgineering the prompt so it will get us a better response. Well, it doesn't work. But we have something to the rescue and it open it works on Gemini on OpenAI and you can verify me if it works on other uh LMS but we can actually enforce the model to respond with a JSON and now it checks every token on the output that

it's actual uh val valid JSON and you can stop. It's a little bit cropped, but I will tell you that now you can stop bagging the model. It will respond only with the JSON that you need. What were we able to find? And we are looking of one week of threat intelligence on Twitter. We have 1,300 domains, malicious domains, 700 IPs, 2000 URLs, and 100 and 30 file hashes. All of that is only indicators from Twitter. We didn't pay anything except for the expensive API obviously, but it was just in front of us. And those malicious IOC's are not IOC's from last month or last week uh last year or some report. It's actually being active now.

People on the in the community are talking about it. And this is an example of that. This is a tweet from a hunter team. They shared a file hash uh at a time when it was shared. No one recognized it as malicious. the VT score was zero. It means that if you log on to VT VT and check all the vendors that flag something as malicious or not, none of them flagged us as malicious. But if you check it after just a few days, you see that it's actually getting traction. Somebody noticed this is malicious. So by gathering all this threat intel from Twitter or any other social platform that you choose to, you can catch it before everybody else.

Another example from the same group. This is a file hash that was also um the funny thing about this one, it was signed by Microsoft. It was actually signed. This is a malware. It was signed by Microsoft and it's even spelled uh with as Microsoft and not Microsoft but no one tagged us as malicious. Uh and then it got really malicious and if you use Twitter you know that 20 29 vendors on Twitter it's okay this is malicious. So we collect a lot of IOC's and we block them. But can we tell a better story? If you have a malicious IP on your network, uh where are you targeted? Who trying to attack you? Can we tell a

better story than just oh we just block it? So our goal is to find IPs and tell a story behind them. Uh for example of this little cropped snippet, this is an IP of C2. I want to know who operates this CNC. What kind of malware family is it? How dangerous is it? Should I block it right now or I can investigate it and then add it to my block list? So we can enrich this data with rag system. What we do is to build a knowledge base. We take MITER techniques, malware families, threat actors, everything that we have now and we enrich the data. And we actually do this not only one time, we do it over time. So it can be updated

with all the new threats, all the new threat threat actors and malares and we have LLM with a context. This context is our ro and next time that we send the tweet we have also all the contacts. We we don't use LLM that was trained and cut during last year. Now we use an LLM with all the knowledge that we gathered. So now we can have a tweet with IOC and everything that we want to know about this indicator.

And our story looks like that. We have a tweet with the IPs and the ports and some tag of RAMcast and Rat. And this is all we have on the tweet. But in our system, we can enrich it and say what are the capabilities? Uh what is the TTP? Everything we can do to enrich the tweet. We do it on our system and it looks like that. Uh this is a snapshot from the platform. By the way, I'm going to share the code. Uh this it's it's going to be published as an open source. uh we have the IP the score on VT but we also have on every IOC's the story the RAM cost everything that we know about

RAM cost to this date and we have a graph uh and ideally over time when we run it we will have a really big graph of every threat actors and on every malware what are the indicator of compromise that are related to that. I've talked a lot about IoC's indicator of compromise, but what about CVEEs? We just saw last week that CVES are exploiting on Twitter as well. And this is example of a CVE on Twitter. But from just one tweet, I'm not sure if it's actually exploited. Can I trust it? I'm I'm not going to trust every tweet on Twitter and shift the entire team to work on something just because of that of of that. So, how do I know if it's

actually exploited? How do I know if the CVE that was just published uh is it risky? What is the CBSS score? Um does it known to be exploited? How do I know everything just from one tweet without going to Google and check and research? Can I get it in one platform? So what Twitter actually gives us that instead of looking at one tweet, we can harvest the community. We have a lot of security researchers. We can see trends like the one we saw last week everybody was talking about. So obviously we need to fix it as soon as possible. And in this example, this is a tweet uh sorry a CVE from last year that is being

exploited now again and it was reported by CVE trends that it's actually legitimate. And we have another tweet that I can't read, but Grock can read it and summarize what's in it. Uh so this is a CVE that suddenly got exploited and it might be related to Aulene ransomware group. So now I can see not only the CVE who is actually exploiting that and on our platform it looks like that the CVE all the details uh that we have on average CVE the CSS score how exactly is it used but we also have an enrichment that if you want to investigate it uh the enrichment of who uses it. For example, here it's enriched uh with culine with 70% confidence and

we have all the links and the sources that reported that and we have a better story instead of just CVE. So what's next? Currently we analyze a lot of text but we ignore images. Um this is example of a tweet with an image and one of the IPs was not mentioned on any text. So to improve the system we need to add image processing and we'll do it later on. And this is our platform. Its name is Raven X as in Twitter. This time I got the name right and uh I'll share the code later on. And this is how the platform looks like. We have the map, the world map with all the IPs, the malicious IPs, where are the

location. We have briefings uh with what happened in the last 24 hours that was skipped already. uh all the IOC's that we collect the score they graph how do they look who operates them everything about IoC's the CVEes for each CVE we actually have a trend graph uh and you can see that this react to shell was really popular and we have all the raw tweets how they look like what we collected and the sources how credible they are And this is how the platform looks like. So hopefully when you investigate any IOC or CVE, you can find every information that you need on just this one platform. So to wrap it up, social media is intelligence gold mine.

We have so much information in front of us and obviously it's not possible to just scroll and read everything. So LMS are really helpful with that. They can do it for us, but they just don't work out of the box. What we need to do to improve them is ro system, enforce the JSON schema, few shots, fine-tuning. We have so many tools to make the LMS works better work better for us. Thank you very much and

LLMs Suck at Cyber Intel—Unless You Hack Them Right

Related talks