PERSEPTOR: Automating Detection Rule Generation with AI-Driven Threat Intelligence

Name: PERSEPTOR: Automating Detection Rule Generation with AI-Driven Threat Intelligence
Uploaded: 2026-05-19
Duration: 34 min 12 s
Description: PERSEPTOR: Automating Detection Rule Generation with AI-Driven Threat Intelligence - Fatih Erdogan & Aytek Aytemur In an era of rapidly evolving cyber threats, organizations face constant challenges in detecting and mitigating sophisticated attacks. Staying ahead of adversaries demands not only rob

BSides Prishtina34:1233 viewsPublished 2026-05Watch on YouTube ↗

Mentioned in this talk

Tools used

curl Selenium Splunk YARA

Service

Frameworks

Concepts

Vendors

About this talk

PERSEPTOR: Automating Detection Rule Generation with AI-Driven Threat Intelligence - Fatih Erdogan & Aytek Aytemur In an era of rapidly evolving cyber threats, organizations face constant challenges in detecting and mitigating sophisticated attacks. Staying ahead of adversaries demands not only robust defense mechanisms but also intelligent systems capable of transforming raw threat data into actionable intelligence. PERSEPTOR is a cutting-edge threat intelligence project designed to streamline the process of extracting actionable insights from diverse threat reports. Leveraging state-of-the-art LLMs and the LangChain framework, PERSEPTOR autonomously summarizes threat reports, identifies TTPs, extracts IoCs, and generates Sigma and YARA rules using AI-driven mechanisms to minimize false positives. The project also provides tailored query recommendations for various cybersecurity products, enhancing its practical adaptability across different operational environments. PERSEPTOR specializes in automating detection content creation and prioritization, enabling Blue Teamers, SOC Analysts, Incident Responders, Threat Hunters, and Threat Detection Engineers to efficiently organize and implement detection rules. Through real-time analysis, PERSEPTOR empowers these teams to effectively prioritize threats and optimize response strategies, significantly enhancing their ability to detect and mitigate emerging cyber threats. This presentation delves into the conceptualization, development, and implementation of PERSEPTOR, highlighting its modular architecture, advanced AI-driven functionalities, and transformative impact on modern cybersecurity operations. Furthermore, the presentation will discuss the applicability of PERSEPTOR in cybersecurity operations through various use case examples.

Show transcript [en]

Thank you. Hello everyone again. Uh first of all we would like to say that being in Kosovo in like for besides as a Turkish people means a lot for us. Uh cuz we know our connections. We will start with a quick introduction with Fati. Oh it's not working. Oh my god. So uh it's ITC again. Uh currently I'm working as a bullet team engineer at Picos security. Before that I was working as a part-time uh I'm kind of fresh graduated last summer. Before also I was working uh part-time as I said and then I converted to full-time after graduation. Before Picu security I was creating detection contents to so prime as a freelancer and some of my main

areas are security research and development uh threat exposure management, endpoint security, threat research and threat intelligence also. Uh one of the field that I improved a bit more than the others is detection engineering. You can see like uh 200 more than 200 detection rules on sock prime platform and I have some batch from sock prime also uh one is alltime top detection content authors by customer choice and the others is privilege escalation detection master and the last one is an eye professional and beside of my security career uh I was a professional goalkeeper as you see I play like 10 years uh like it doesn't have to be cyber security to talk you can come and find me after the

presentation to talk about medurich or I don't know whatever you want. Uh thank you that was about myself. Fati will continue. Hi everyone this is Fati. Uh I've been working in the cyber security for over 7 years. Uh currently I'm working as senior bullet manager at Picos security. Uh I worked on various digital forensics instant response and threat research teams throughout my career. uh I focused mostly on endpoint security and developing security tools. Also I am interested in vulnerability research. Uh I also participated in uh cyber security organization as a speaker and uh trainer. Uh you can details uh from my blog. Thank you. So what you are going to hear from us today we will start with question why

why we created this tool and then we are going to try to explain concept and architecture. We will give some use case example. We are going to see uh perceptors outputs. Uh we are going to explain some challenges that we face through during the development phase. Uh some impact and benefits, some exciting future works and then we are going to come to conclusion and then after we are going to take your questions. So why why we created this tool? first uh especially when I when I was creating detection contents to so I had to read like 10 15 thread reports each day and there were no automated pipeline to like extract all the patterns all the

behaviors and then maybe if I might can create contents or not but with perceptor we are going to try to solve it and also uh one of the main point of perceptor is taking time uh from the back of the analyst and like each Today uh during the this year like 2025 every day we are seeing new AP groups we are facing new vulnerabilities uh like bypassing techniques uh like threat intelligence can't keep up with the all growing volume of threat intel and if that there is not automation tool for this process security security teams can be overwhelmed. So fat will explain some key capabilities. >> Yeah. So what's perceptor? What we developed? Uh with perceptor we can uh

analyze threat post quickly with AI and we can use effective detection engineing operations. Actually we designed an AI powered approach. Uh this is the back end and everything is uh connected with this. Uh we analyzing any thread reports with AI powered approach such as a campaigns, malware writeups, CV disclosures, emerging threats and etc. Uh with this we obtain a contextual thread intelligence data and u such as thread summarization uh IOC's, TTPs and execution chains. Uh and using this contextual data we are automating the uh detection content creation process. Uh we are generating sigma rules also their same queries. Uh and finally uh we are finding the global sigma rules uh match in the wild workflow. Uh let's take a look at the

workflow steps. We have seven I take we have seven main steps. Uh perceptor first uh takes the URL input of the thread report. It passes the content. Uh if there are images, it analyzes these image using OCR, optical character recognition. uh and it normal normalizes the entire report. Uh next extracts all uh IOC's TTPs mentioned in the report uh and then learns all the uh attack details and summarize it. Uh it identifies the target sectors uh relevant to the attack and determines CV relationships. Uh also it correlates the threat on malware families uh related to attack. After the report is analyzed, uh the detection content creation process begins with error rule generation. Uh the error rule is automatically uh

developed using indicators with our algorithm. Uh likewise the sigma rule uh is automatically developed using IOCTP mapping. The final rules is ready for uh use after going through an AI powered optimization. The created sigma rules uh are then converted into sublank and cur queries providing readyto use uh detection queries for sim systems. Uh finally we don't only uh rely on the contextual data in the report but also use this data to search for detection uh content match globally uh does contextual data is also associated with global detection content. Thank you. So I will try to explain our model structure like uh which model that we developed for uh some functions. First uh we have model for IOCTP like

basically all the pattern extraction uh like you can understand from the name we are trying to extract all the patterns like not just IOC's or TTPs like with with the chains like uh suspicious parents and the child processes and then we are trying to analyze them to maps them to miter attack like miter ids and then we are identifying malware behaviors all the chains and execution techniques uh which technology we used we use human message and vulnerable seconds uh functions from longchain and also requests for behavior extraction uh beautiful soap and reg x for web scripping and parsing and for AI assisted thread analysis we use longchain and GPT APIs. So our second model is for image

processing. Uh for ex this is for extract hidden the all indicators from image and screenshots cuz it can be too important that like even even if you are reading the report you can see that all the chains all the patterns can explain in the image. So that's was that was crucial to like process all the image and we are using OCR to extract all of embedded informations. We are capturing JavaScript generated content and dynamic image from the URL like which includes spread report. Uh we have also conversion of uh every type of image like cfg, PNG and GPA. And uh first we have like uh technologies pyessact and pyo for OCR and we are using selenium

and chromedriver for web- based screenshot capture. The other model is for uh thread analysis. uh it's basically summarized trade reports to extract CV details if it includes in the reports targeted sectors and attack chains and then it identifies all industries like which targeted and all campaign detail with threat actor groups names. Uh we are using longchain and open AAI technologies for AI powered analysis. We are using prompt template and runnable sequence function uh from longchain also for prompt engineering and summarization. uh when we will come to PC results I will explain uh these ones with more detail and of course uh for structured data processing we have JSON format. The other model is for uh generate for

detection content generation. It basically generates sigma and yellow rules dynamically based on like old patterns from the thread report and then using AI to refine rules for accuracy. We have pi sigma lchain and uh GPT API for sigma rule generation and for rule optimiz rule optimization we are using lchain and openai also and for uh the format of ruling uh we are using yl format and for y generation we created our algorithm also I will explain our algorithm when we will when we will come to our PC results and for user interface interaction for now perceptor is based on uh command like terminal based like Fati was too obsessed to take to streaming it or like

some kind of web UI but I was also obsessed with sticking terminal cuz we hadn't that much time so as I said it's based on terminal for now uh it displays realtime analysis results on like CLI based UI we are using rich library to format output for readability uh we are using rich library for format output that is important cuz we are going to see a lots of information of the execution of our tool like we are going to see all summarization CV details detection contents. So it was also important to show this results with structured format. So and the other other technologies we are using for inter user interaction and login we are using CIS and CSV

technologies. And our last model is for global sigma match. Uh we are parsing of thousand of sigma rules in like 20 seconds. We are focusing on the selection block on sigma rules. We are parsing it and then we are taking like keywords and then we are making token of them and then we are trying to find these tokens in our IOC tables. Uh we are focusing of the like as I said the selection block and then we are using regax tokenization with dynamic stop word filtering. So what what is it about like stop word filtering? You can add stop words to reduce false positive like cuz you can see that some some of sigma rules

includes like windows or Microsoft indicators. So if you want to reduce them if you want to reduce false positive you can add stop word like with configuration file and then we are giving match ratio competition between report tokens and sigma detection content. I will explain with more detail when we will see uh PC results. Uh we use C safe loader for performance. Uh we use concurrent features for parallel processing and of course regax for tokenization. So I will I will try to explain two technology that is two important like we can say these two technology as these two technology is like core concept of perceptor. But before that I'm thirsty. I will drink my water.

Okay. So, uh, longchain. So, it's basically open source framework designed to streamline the old like development applications powered by LLMs. Instead of limiting LLMs for isolated prompt response patterns, longchain enables the creation of multi-step data and agent-based applications. And with LLM feature it powers natural language task using open AAI and hugging phase. Hugging phase I wanted to streamline here because if you are interested in AI you can follow every updates on hugging face. If you didn't check yet you can definitely check it. And with memory with memory feature uh it we we can enable dynamic and stateful interaction. And with prompt engineering feature uh we were be able to create tailored prompts for optim optimized outputs. And

with agent executor which we didn't use in perceptor uh you can execute task dynamically using external tools. So with that all features what you what you can be like what you can be done you know at the final chain you can create like question answering over documentations and also you can create summarization which we did in perceptor to create summarization of thread reports. You can create chat bots. You can querying table data. You can interact with APIs which also we did in perceptor. You can you can think like the longchain is a bridge between user and GPT. With longchain your inputs getting like more informationational and more understandable. So you can interact with APIs and after like most of UI uh

you can code you can create some functions for code understanding. So which features uh we used in perceptor with lang chain? Uh longchain's prompt engineering features uh give us the opportunity to construct structured prompts to extract all the patterns and create detection rules. And with memory management feature uh we are using we are using entity memory to track previously proceed threat intelligence data ensuring context continuity. uh you can think that like you are working on the same tab in CHP on the API level with longchain it's possible to use API with that uh memory management feature and the other feature that we use in perceptor is chaining functionalities we we we were able to connect AI components like GPT for like

seamless automation and the last feature that we use is for enhanced automation it can streamline all process like all threat intelligence process for reduce manual effort and improve accuracy. So what is AI approach? How we are using AI in like in this project? Like in this diagram uh we try to illustrate the AIdriven structure behind behind our automation pipeline. Uh on the left sorry we see the user interacting with longchain based interface which coordinates the core components the LLM memory prompt and agent. The green blocks indicate the elements we actually utilize. Uh for all three core functions of the system, we rely on the openi1 model like which consistently provided the most optimal and balanced result in

our evolutions. Like although models like GPT3 or GPT4 available, we found that 01 outperforms like them in detection quality and response clarity also. However, like using O1 at the API level requires a certain credit threshold. Like even if you have enough credit, you have to spend some money to use 01 on API level. It's not a good thing, but it is what it is. Like on the reasoning effort side, two of our three functions operate under high settings while the third runs at medium. Uh we init we initially avoided using high for all because it increase response time significantly. like if you are going to put high for all of your functions that you are going to use to interact with

GPT uh it's like the time going to be like 30 30 seconds 40 seconds more and in the end for each proceed url we make three calls to the one model two with high high reasoning effort and one with medium which results in average cost like one $15 per analysis so you can think that like if you want to decrease or increase the cost uh you can consider that to for use in other models like it doesn't have to be GPT also but long chain integrates with GPT that's why we use GPT and for reasoning effort you can choose lower medium uh to change the cost so other technology is selenium maybe most of you know maybe not it's also

open source framework that like automates web browser interactions uh commonly you can use for testing web scrapping and dynamic data extraction from web applications which features we used in perceptor like which selenium features we used. First uh we have dynamic content handling. uh we were be able to interact with JavaScript rendered content making it useful for extracting data from like all modern web page and with web element interaction uh we were be also able to simulate human like interaction such as like clicking button like filling forms navigating page and handling pop-ups like you can do like a lots of stuff with selenium and with headless headless execution uh you can like run without opening a

browser window we use also chromedriver with selenium uh that's How we are taking screenshots of image and you can also making it efficient for large scale automated task with this feature. And the other feature we are using for uh image processing support. We can capture screenshot of web elements for OCR analysis enabling detection of threads in image. That was too important. Why? Because first uh we were trying to create functions for all image types like we were trying to convert PNG to text uh C cfg to to text and also like GPX and then we saw that even if we are going to create like 10 functions for all image types we can see the new one

on the new thread report. That's why we decided to use Selenium for take a screenshot of the image and then uh convert it to text and send to OCR. So what are the what are the possible use cases with our project? Uh you can basically turn route rate intelligence into actionable detection in minutes. Like with perceptor you can automate the process of summarizing report extracting all behaviors and generating detection content eliminating hours of manual work for detection engineers which is crucial as I mentioned before and you can enhance detection quality using AI generated insights. Security teams can reduce alert fetage by refining existing rules and mapping accurate mitra techniques and minimizing false positive through GPT optimization. And you can

also correlate indicators and behaviors across multiple threat actors like automatically you can automatically identify patterns across reports find recurring malware families link IOC's to non campaigns and highlight shared TTPs to strength threat landscape understanding. The last thing you can do, you can check if your local detection is already globally known. Like you can map analyze threats to community sigma rules like Sigma HQ allowing analysts to detect overlaps and accurate response based on global intelligence and shared detections. So after all the like workflows uh modular structure some use cases now we will check the P results. We choose a uh report from ESET uh which includes operation fish medley. You can scan the QR code to check uh what is it

inside. By the way, Fati created the QR code. So if something's going to happen, find the guy, not me.

So the user basically just need to give the the URL to the R tool and that that's all user has to do and then we are starting to process all the image and end the end of the lines we are seeing thread report somewhere being generated and after that we are going to see thread summarization what it includes we are seeing the thread actors group names uh we are seeing some targeted like sectors and targeted regions And after that we are seeing severity level for the report. And then we are seeing the some key TTPs and some recommend mitigations like these mitigations can be too generic but still it can give like analyst to something.

And here you are seeing the reference like where perceptor took all the information from the report and then we are starting to see our JSON data which is crucial. I will explain when we will come through the creating detection contents. We are starting with sigma rule title description confidence level and some notes and then uh we are continuing with some IOC's which is which are categorized some IPS domains URLs email address which we couldn't find on the report file hashes and file names also you can see uh the references of these indicators and we are continuing with registry keys process names and malicious commands I want to mention One thing uh especially after the Daniel's presentation you can

ask that why we didn't use regax for all this extraction like cuz there are lots of tool lots of tools for IOC extraction and TTP extraction but what we are doing with longchain is taking the old package of the malicious commands like you can see that we are seeing some DL sideloading behavior with rundl and then we are seeing some minm commands like with the tmp file and then we are seeing the registry activity if we would use regax like there will be no way maybe in the be like before I show you that IP IPS domains URLs maybe we will be able to take these patterns but for for like malicious comments it was like for now

it seems like impossible but after Daniel presentation we might check it thank you again by the way so after that we are going to see some my mapping In in this part the cool cool thing is uh we are seeing the descriptions uh in this in description we are not explaining the myra technique we are explaining why we mapped this technique. So for example we can check for process discovery uh we we saw that fish the thread actor executed task list command to discover process. Even if this is false positive for you, you have the opportunity to eliminate fastly. And after that we are seeing threat actors and tool and malware names and in the middle there there is a reference

and after all the information and summarization now we are seeing the detection contents. First we are seeing the error rules. Uh now I will explain our algorithm. When we were extracting the all IOC's, we didn't just extract them. We try to categorize them and after the categorization we created a little algorithm for example for IPC use askful word on strings for domains uh use no case for strings it's really basic basic and generic but still it can give something to analyst and the other rule we are seeing for file hashes file names registry keys and some processes like malicious commands etc and after all that We are starting to see our sigma rules which is backed

with pi sigma. Uh we had some problems uh for applying pi sigma on our tools. Uh first one was you can create sigma rule uh with multiple selection with pi sigma and with perceptor output we were be able to create sigma rule only with command line pattern. That's why we decided to use longchain technology to create another detection contents. when when we began this process to use lang chain for creation uh for creation of detection rules we think that it's not going to be stable and it can be changed a lot because it's AI output but what we are going to see is going to be like better results for pi sigma for sure after the pi sigma based sigma rules we

are asking GPT to refine this detection rules so we saw that it created uh sigma rules like five different sigma rules and when we asked the GPT it created with one rule with five selection and after defining the sigma rules we are asking for a splank query and cur query and after that what we are seeing now outputs from lang chain not pi sigma backed sigma rules so what we are seeing on the select selection part it's one of the most important part of sigma rules it's clear sigma rule with ends with and contains fields there are no trash wild cards there are no trash pets like as a detection engineer I can say this is

like readyto use rule of course it can be generic for your system or maybe you have already but still it's clean and the syntax is better than the pi sigma back other rule we are seeing from the lang chain also is registry activity uh it saw on the reference for registry heaves with that patterns and it created registry rule uh like like the other no wild cards no pets only pure detection block. Another rule we are seeing for uh create file event type. Uh it understand that this file this file contains custom password filter. So it decided to create a rule for uh detect this file. So what we are seeing here is so cool

for us cuz it saw that there are too many DL files which side loaded by the thread actors. It it's literally created a regax for detect all the DLS on at the same time. So other selection we are seeing some original file names which also include in thread reports as suspicious or like directly malicious and the other like other rule we are seeing is with multiple selections. First selection we are seeing some PSA execution on second we are seeing some suspicious VMI usage on the third we are seeing Q user command and the last with register activity. So after the creating all the detection contents we have possible global sigma ms. So what we did here is take like

when we parse the selection block uh on the sigma hq repos we are trying to find them in our table if if we are going to find them and it's more than one keyword you are going to see like as a matched so where is like match ratio is coming from so it's basically if you checked sigma sigma hq repo before you can see that there are sigma rules like with 20 indicators like with if the lines like of course we love Florian but some rules are can be trash so that's why if the rule contains less than like I don't know 5 10 IOC it's more important for us ratio going to be high if rule is more

low and like more optimal way and here on the sigma rule title you can click and you can go directly to GitHub page to deploy to your CM or EDR whatever you want so What were the challenges on during the like creating this tool like complexities in image processing as I said first uh I already explained when we were in uh modular structure but converting all the image types to text were too struggling and then we decided to use selenium and now it's like uh 80 90% successful rate extracting malicious patterns also I mentioned when we were in modular structure with regrax we were able to extract ETPs and IOC's But taking them with all packets were not

possible with regax. That's why we use lang chain GPT API limitations. At the beginning of the project, we were thinking like creating some kind of source pool which includes maybe 100 200 URL when they're going to publish a publish report per se going to be executed and then we are going to see the result. But after when we solve the cost for one URL we decided to stop there because it has like for each day we had to spend maybe $500 $600 because like let's say like every URL going to publish one report for each day for 200 report with $1 and half dollar it will be too much that's why we decided to go with just one URL but we

will try to solve this and issues with py implementation because I already uh explained the problems about pi sigma matt attack mapping challenge we saw that we can create some algorithm for techniques like for example if we are going to see cmd on IOC table we can map to directly like it's okay t 10 59 but after we saw that we had we had to give maybe 2 months 3 months for create like all algorithms for all my techniques will be impossible that's that's why we use longchain in for mapping also and sigmo syntax logic optimization you solve the pi sigma b sigma rules for for syntax way and it's not optimal and the last thing last challenge were like the

parsing and processing thousand of sigma rules if it will be like 30 minutes 40 minutes our tool me meant nothing cuz you know the main point that we are trying to do is taking the manual effort from the analyst so we had to do it in seconds and thanks to c safe loader and concurrent feature technologies we solve this challenge So what are the impact and benefits? I take talk about the internals and uh in the wild usage of uh perceptor uh we work hard uh we saw many challenges and developed uh perceptor at the end. Okay. So was it worth it? We think yes. Uh we believe that with this complete automation uh we have increased the

detection engineering process uh 10 times faster. Sorry for my voice. It's hard. No just a second. >> We believe that uh with this complete automation we have increased the detection engine process 10 times faster. Uh while it may normally take uh some time to analyze thread report and develop detection content. Uh with perceptor you can do this in uh a few minutes. Uh normally the report may take uh two or 3 hours. uh but with perceptor user will seal the uh all critical information and outputs in 3 minutes. That's why we believe that uh we have reduced the operational pressure by uh blue teams by 50% especially against the emerging threats. So what are the some exciting future

works that we are working and trying to plan it? First we are going to try to optimize sigma rule output cuz as I said you saw pi sigma back detection contents these are not enough for us for now. uh we are going to expand our CM and EDR queries like you are not going to just see and curator you are going to see also like cross track IOAS like the sentinel queries also and then we are going to try to give comprehensive report to the user uh after the execution with all the detection contents and all the information about the threat and after we are going to try to create some kind of prioritization mechanism for detection contents and

also for the old threat report and then uh for false positive reduction We are for now we are uh reducing false positive a bit with line chain but we are going to try to do like with high percentage and then we are going to try to also continue create continuous threat intelligence processing as I mentioned because of AP like GPT uh API limitations we were not be able to do it but we will try to solve this and lastly uh we are going to create some structured database for proceed reports for community and for like some kind of correlations If user going to give some URL to perceptor, we are going to like everyone going to see the outputs in one

database. Uh so it might be cool and also we can correlate these outputs uh like with everything. So what are the conclusion like uh like for example other persons were talked about AI we are trying to do something with AI it's undoubtly cool it's great but still it's not a silver bullet like it needs the human review it needs to like finetuning so it's not going to replace for like your job or something it's kind of cliche but it's not maybe yet but still as I said human validation is necessary Okay, you you might ask that if you are going to be able to use uh our tool. So we are still adding some features, some cool

features like as soon as possible when we are going to finish it, we are going to publish on LinkedIn. Uh you can like add add us to see the source code for use and if you can give some feedbacks like at the end you are going to see the QR code for feedback session cuz it's tool and we believe the community strong strength and if you are going to give some feedbacks will be great. Any questions? Like if you don't feel comfortable, you can find us like after this presentation. It doesn't matter. If it's not problem, you can raise your hand. Any hands? No. All right. Any quick questions? Okay, good. Uh you can find them later

in the hall. Thank you, Fatty. And I think Thank you.

PERSEPTOR: Automating Detection Rule Generation with AI-Driven Threat Intelligence

Related talks