← All talks

BSidesSF 2026 - Incident Readiness You and Your Leaders Will... (Shachar Hirshberg, Hadar Waldman)

BSidesSF29:2112 viewsPublished 2026-05Watch on YouTube ↗
About this talk
Incident Readiness You and Your Leaders Will Actually Trust Shachar Hirshberg, Hadar Waldman Most security programs give leaders dashboards they don’t act on. This talk proves a practical, repeatable process that helps you ensure you are ready by turning log-health checks, detection-coverage mapping, and crown-jewel scenarios into a short, prioritized list of improvements leaders will fund. https://bsidessf2026.sched.com/event/3e77d0f720c7425e301f9650f2c54755
Show transcript [en]

Our speakers for this final presentation are Shahar Hershberg and Hadar Waldman. They are from Artemis. Shahar is the CEO and founder co-founder and Hadar is a security engineer and she told me specifically to say not a co-founder. So, I will hand it off to Shahar who will kick us off. Thank you. Thank you all so much for coming here. Can you hear me okay in the mic on the mic in the back? Yeah, perfect. Um so, thank you for the time. Hope it will be the most educational and entertaining uh presentation we have at B-Sides at least until the end of the day, you know. Um so, today we'll speak about how to gain visibility into your environment, a

little bit about us. Um Shahar, as mentioned, co-founder CEO at Artemis, spent the past 15 years in technology and cybersecurity. And as a part of my career, I was at AWS for multiple years building Amazon GuardDuty. And before that, I helped build Demisto and the Source Space. And today at Artemis, we're helping companies stop attacking their environment either by completely replacing their legacy SIM or augmenting their legacy SIM. And today we're actually not going to speak much about Artemis because we're going to come out of stealth in about 2 weeks, but we wanted to share more about very interesting things that we found we were able to do and to try to help you do it

at home as well or at work, up to you. So, with that, I'm going to pass it to Hadar. Hi. There's going to need to be a height adjustment here. Um I'm Hadar, spent the last 12 years in detection and response, was a SOC analyst, did detection engineering, was a SOC manager, did incident response, all sorts of things. Um besides that, I also am very excited about data analytics and I'm a big musicals geek. So, I'm very I want to person who personally thank the be sides San Francisco team for choosing this team because how else will I be able to present this slide, which is my Broadway statistics. Thank you for for being here to see this. I've seen 280

shows. This is when I moved to New York. You can see the number spiked. These are my worst most re-watched shows. It was very important for me to share. Thank you for listening. Back to Shahar. I will say Hadar asked me for my stats to try if we can combine them, but they were not satisfactory, so we just did the dark. So, a bit about what we're going to talk about today. First, we're going to talk about why it's so hard to gain visibility into what is going on in production. And this is particularly important today and it's going to just become more important in the future because as you imagine a future where in one to two to three

years, most of activity in production will be operated largely largely by agents, you have to maintain the visibility over what's going on there and you probably can't exactly ask agents to just tell you what they're doing for obvious reasons. So, we'll talk about the problem. We'll talk about how we are able to or discover we're able to drive intelligence insights by looking at real telemetry in production environments and how you can implement that at home. And then talk a bit of how about how you can drive action within your organization to get to real outcomes using the improvement opportunities that you identified. And as I mentioned, production environments are incredibly complex. I want to start with a quick

question. Who here has some kind of CSPM or other posture system in place in their organization? Okay, I see about half of the hands, which is surprising. I would expect everyone to have it. Um not sure about compliance here, but let's let's say uh let's say people just don't raise their hands. Now, how many of you are confident that you know what is going on in real production environments to the level that if I ask you, you know, what's going on with your um roles, with your um identities, and the way that log into your environment, and whether MFA is actually enforcing practice, or if the data that you have that is supposed to be analyzed by your seam or threat

detection systems is actually enabled at the source and is getting to the right destination. All of these things are incredibly hard to do, and I can ask again, so raise of hands, who is really certain in all of this? Okay. Now now it kind of matches my expectation. I see one hand, well, moderator, I'm not sure I feel like you're lying. But the reason for that is that real production environments are incredibly hard to model and understand, and a lot of the time we have some guardrails in place, and we have some static analysis tooling in the form of various uh posture tools that tell us this MFA is enabled, but it's really hard to

actually validate and know that things are enforced in production and are happening as expected. And as a part of that, you do need to have the real data in order to build trust and drive action. So, with that, I'm passing to Adar to talk about how we can actually do it. Okay, so the solution we proposed or kind of found for our self that works is something we call an environmental intelligence report. And the idea is pretty simple. You start with querying your logs wherever they currently are. You make a report out of it using an LLM and then you do it again. And the idea is that if you track it over time you'll be able to

find changes. If you if we talk for a second you need We really used You can use whatever you have in your environment. We used very simple tools as well. Wherever your logs are currently stored and however you're querying them is good. So if you have an S3 you can query with Athena. If you have Splunk use Splunk API with SPL. And the goal is to just be able to run queries through API. Then you'll need to save your query somewhere. I use Jupiter notebooks. You can use that, you can use scripts or whatever. Then you'll need your AI coding buddy. Like Claude code or Codex that will analyze the outputs and create the report. And you'll need to select a

report format. I find that markdown work really well for LLMs and then export it to PDF or doc or whatever you need. And I think the question the big question here is what are you going to be looking for? So we actually had a chance to experiment with a lot of different logs. We've ran the same concept on endpoint, network cloud, identity key card access logs. But I'm going to take an example for identity logs just because it's I think it's pretty easy to to follow that example. So certain questions that I would like to know the answer to or certain categories that I think are interesting is one is the privileged actions in your

environment. So, what are your privileged accounts actually doing? And to get to this, you're not going to take your list of administrators and kind of query what they're doing. You can go the other way around. You can look for privileged actions. So, what uh logs indicate that a user had high privileges and then look for those and kind of build a uh visibility around that and identify it. And um we found that there's a lot of kind of unexpected results. First of all, there's a lot of dormant admins no one is using. Um other things that you can find are um admins performing actions that you don't expect or from locations that you don't expect or using user user agents

you don't expect, and that's interesting to see. Then we have geographic, so where are uh authentications coming from? Um what are the top locations and countries? And then what are the least used locations and countries? That's also very interesting. You might find some VPN usage um or some less authorized activity, um employee working from somewhere they're not supposed to be working from. Then we have apps. What are your employees actually using and authenticating to from your Okta or Entra or uh identity provider? Um MFA usage, that's actually like really interesting and and gave us some interesting results because the difference between what you think your policy is and what is actually happening is quite a big gap. Uh we found

um so far we found uh for example, uh a a rule of policy that was supposed to be deprecated 2 years ago, it wasn't. Users are still authenticating with password only in a specific organization. We also found that some uh policies allow for um MFA factors that are not fishing resistant, while the admins thought that everyone was using just fishing fishing resistant factors, so that was interesting. Um failed and deny action, that's also interesting cuz you found uh you can find kind of deprecated services that you thought wasn't in your network anymore, but they actually are cuz there was a user left behind or credentials left behind and they're failing to authenticate and that's also just an

unnecessary attack surface in your organization. And finally just in the suggestion category, service accounts also interesting results. For example, a service account called read only that is definitely performing right actions. And you need you need the actual data to to be able to tell that because you can't tell it by the name of the account. So all sorts of interesting things like that. Another thing here was there was a service account that kept authenticating from the same IP except for using one IP from a the home network of an administrator. So you found out like that that the administrator is using that account to perform specific actions that they didn't have privileges to. So that's some suggestions for the

capability of what we can do there. Um This is an important slide. So the question remains, how do you take all of this information that you queried and exported into files and generate that into a report? So we're going to use an LLM and I find that I really like using skills. I'm assuming who here have used skills before for uh with their coding? Great job, Dan. Okay, pretty pretty good. Um so skills for those who haven't used it is a way to give a a set of instruction to an agent that you pre-shaped. So you put all your instruction into a specific file and then you call that with a slash command. As you can see here, my command is slash

generate report, really original. Um and that command is going to keep running the same analysis every time I call it. And this is important because when you give if you take all the outputs that you just created from my previous slide and you give it to Claude and you tell Claude, tell me something interesting, it will tell you something interesting. It will find things. But, if you're trying to create a program in your organization and repeat this and share this report with other people in the organization, you need consistency. And that's something you're not going to get if you just tell Claude to find something interesting in your data. Um so, a few uh tips for making a skill

that will create consistent reports. Um one is use a template. So, for example, in my reports, I like having a table for all the findings at the beginning. I like it having specific columns, and then it's repeatable, and I can use it every time. Uh so, I have that in a template in my uh skill folder, and then it will just fill it in, and it doesn't have to come up with a new version of the table every time. It makes it a lot easier. Uh use sub-agents. That's an important point. I I think the longest report we generated out of the system was 60 pages. Uh if you try to generate 60 pages out of

outputs with an LLM, you'll get a mess. Um so, using sub-agents to um break the task down to something it can actually uh ingest in one go will give a higher a higher-quality output and consistency throughout the entire report. So, not just the first section will be good, and everything will be trash. It will be good throughout. Um validation step is also important in the skill. Actually have a step where it's checking the quality of each of the sections. So, if uh one section is too short, it will re-trigger that sub-agent to try and improve that section. Um and give the LLM very clear writing guidelines to the language that you want it to use. Um and

that brings us to this point, which is how to prompt effectively. Um and the goal here is uh LLMs tend to be very confident uh in their lies. Um no, I'm kidding. Their assumptions. And um you want to avoid, especially if you're sharing this internally and you're going to share it with other stakeholders, uh you don't want it to give very confident statements on things that are assumption are essentially inference. Um so one tip is to require it to use citations. So now when it's making a a statement, it has to tell you what data it's based on and it makes it easier to review and it forces the LLM to kind of use specific data and know what it used.

Uh then we have enforce hedging. So use language that's softer and not very strict. Uh separate fact from inference. So label some things are facts. If it did account for a certain amount of logs or certain activity, that's that's fact. It's repeatable, you'll get it every time. And then there's some assumptions made over that fact. So that should be written as such. Then we have if the LLM sometimes is missing a piece of information, it could just fill in the gap itself. And that's not recommended. So asking it to if you don't if you're missing a key piece of information, tell me and I'll give you a query. I'll give you that information to fill it in. That's

important. And then the last tip is uh challenge your claim. This is to the LLM. So if you have something that you think is malicious, also come up with a benign explanation for it and then check the two against each other and see which one is the most likely to be correct. Um and my last advice here is that when you share this report at the end of the day, it's your day it's your name on the report. So it could be Claude or any other LLM could have made a mistake, but if you shared it internally, it's now your mistake. So you need to review it very carefully even if it's long if it

even if it contains a lot of information and you need to be able to answer questions on it confidently. Back to Shahar. All right. So we generated a report. And for example, in the report we have now all the roles that are using permissions they shouldn't use and maybe we have found some examples to save on cloud spend because some certain service account has misconfigured IAM policy that creates a lot of errors in CloudTrail that you eventually pay on in CloudTrail cost and S3 cost and maybe our SIM cost. But you want to make it actionable and you want to make sure you're able to drive action. So, a few things that we have found to be helpful when wanting to

drive action within an organization include first for management, you want to lead with the impact. So, if you tell someone, "Hey, I we have 300 roles that are over-privileged." That's fine, but you know, what is real impact? What is the risk that stems from it for my organization? Is the more interested interesting question. So, you can frame your suggestions to leadership in a way that is explainable to them and that allows them to then understand importance and require action and collaboration across the organization. And the most important thing that we have found also with leadership is consistency and this is why we recommend doing a weekly or monthly cadence for the reports where you are

able to show progress. So, you can show the wins and the gaps this week and the plan for next week and then the next week you come back and you can show that you fix all of these things and reduce the risk for our organization, which brings the confidence to go invest more in such program. And there are stakeholders that are more technical. So, for example, cloud security team. To tell a quick story with I think six customers so far, each and every one of them were able to find things that saved them multi-million dollars in their AWS spend, let's say cloud spend. And when it comes to the security team finding these observations, they still

need to work with the cloud security team to make the fixes. And with them, it's really helpful to bring more technical evidence. So, linking to the exact logs that show that the activity is actually the ones that report says is happening, as well as tracking the resolution on the improvements together in order to drive actions together. And finally, for cost, um it's also really good to uh show when things are not exactly used, and that's the quantify over provisioning. Because especially in cloud environments, you end up having so many resources that are not necessarily utilized. And by looking at real telemetry over multiple months, you're able to prove that something is really not used, even if it's a backup

role, probably it would have triggered every once a month or something like that. So, all of these, as Adar mentioned, is something that we discovered in the power of analyzing logs and combining them with business context a few months ago when we started Artemis. Uh we eventually built it into a capability in the product. And you can do it yourself. You can do the queries. You can run and Adar shared a lot of the prompt tips, the structure to be able to drive this instance yourself. And we can just recommend from our side is that every organization that we go in, we find a ton of stuff across identity, cloud, network, endpoint that nobody knew about. And you can do it

with the tools that you have at home and really able to drive impact for your organization. So, we highly recommended I highly recommend to take a look. And a few uh final takeaways. First, what we see is that the gap between the static configuration and the policy to reality in production is something that always exists. People always are surprised by what is found in production, especially by the way in enterprise scale, and this probably doesn't come as a surprise. Uh the second thing is to start with questions. Think through what is interesting for you. Are you working on improving your identity this quarter? Are you working on cost-saving improvement this quarter? Starting from these questions, focusing on them,

building a program, iterating, and then continuing to the next effort. And finally, that the value compounds. So, as you are preserving the consistencies and showing that actions that you did last week or last month reduce your cloud spend or secured your environment or reduced the amount of users logging in without MFA, you're able to drive the story of improvement across your organization. Um with that said, thank you again first for the time, and we'll open it up for questions. I think we have the uh Q&A that the moderator has, and are we passing the mic around or is it just people asking? Uh if if you uh we're going to read questions from Slido, so

please put your question in here besidesf.org/qna, and if we run out of Slido questions, then we will uh ask you to say your question out loud, and then we will say it over the microphone. So, do you want to take this? Do you want me to do it? Okay. Oh, I love this one. Can you give an example of when LLM created a really bad report? I'll let her talk. Take it. Um yeah. Okay, so uh a key element of this is giving the LLM the correct data and the correct knowledge to interpret the result. I think in one of our first attempts we said we told the someone that they had no MFA in their environment

because we didn't read the authentication data correctly. That wasn't great. Um and it's there's a lot of stuff like that essentially of the LLM is taking data. Sometimes oh, there's great great times where it's just reading half the data instead of the full view and then giving you results based on But nothing that was catastrophic, I think. And I think I'll I'll add to that is at our previous point of you ultimately own the report, you ultimately own what own what you report to leadership. So it's always important to do the sanity check and the validation. Um, the two next questions are kind of similar around data privacy. One is how do you convince customers

that it's okay to be using Claude to analyze data and similarly on data privacy. Um, so for us specifically, we use Bedrock to run any LLM call and we work in a single tenant model. So every customers has their own VCPC. A Bedrock and AWS more broadly have the promise that data sent via Bedrock is never used to train Claude and tropic models and we I I work for AWS. So I'm I'm pretty I'm confident they truly keep their word about it. Um, so that helps our customers. Um, it is important to know that if you do it for your organization, just make sure that the LLMs you're using are most likely need to be under

the enterprise version just to make sure you're not leaking any data that is sensitive outside. Um, Okay, next one is Why do we need LLMs to do it at all? Isn't it just a schedule report? What is really transformative about the role that LLMs play? I would say I'll start and I'll let Adar add is that the biggest unlock we have seen on um LLMs is their capability to connect the dots across multitude of data points when they're given the right context. And this allows them to understand, for example, um let's say Let's say you just do a SQL query and then you need to analyze, I don't know, hundreds of pages and hundreds of thousands of rows. It's very

hard for a human to do it at scale, but LLM can reason through it in an iterative way and also connect the data in an agentic manner, meaning you saw the identity and authentication um data from one source, but then you connect it with the endpoint information and the network data to tie the full piece of who is this person, how do they work in their environment, what they really use, and whether this activity is legitimate or not. And it's extremely difficult to just do it with regular reports. Okay. Um

Okay, so what what are you looking for when looking at uh physical battery data? So, that one is actually great because if you're tapping into your office every day and if uh for those of you, you know, everyone here is in security, I'm sure everyone looked at kind of like suspicious logging, abnormal activity at some point in their life maybe 100 times per day, kind of depends on your role. And it's really easy to tell if someone is really in the office when you look at their badge information, right? Because if they're logging in from Thailand today, not via a VPN, for example, and they just logged tapped into the office 1 hour ago, it may be a true positive and vice

versa. Um and also, you can look for kind of like people going in and in and out of the office in suspicious hours or someone giving notice and you can pull that from Workday and then suddenly they stay in the office until 10:00 p.m. when they typically always stay until 5:00. Some things that are, let's say, more suspicious that you might want to take into uh deeper consideration. Um and then multiple other questions on hallucinations. Ah, okay, this is nice. Is there a benefit to using multiple LLMs to evaluate each other's work? Uh we do use LLMs as a judge and we use multiple layers, depending where in the product, but multiple layers of agentic

evaluation. Sometimes it's seven even like five steps of agents checking each other and we did find it to be very valuable in terms of increased efficacy. Obviously, it comes with a cost, so kind of depends on your, let's say, LLM budget and how much you can put into it, uh but it's definitely proved to be a valuable for us. Um No more questions here. Do we have any questions from the audience? Uh question down there, yes. Uh do you manage like huge logs or huge amounts of logs when you have context windows and like finding patterns while keeping the context window So, the question is uh how do you if you're using LLMs, how do you manage

processing large amounts of logs and uh make sure you stay within your LLM's context window? Um I think there's a there's a reasonable limit. Um, so if I know I'm I'm dealing with a giant data source, I'm going to query I'm going to limit my query in a way, but even even in that way that I I did a limit I I I took only a subset of the data. Sometimes I get to an amount that is too big. So, uh I that the subagent part that I talked about really helps cuz then I can take like a specific output instead of like all of the outputs of my entire report. Uh and just let it focus on that and

create an output and then I'll have an agent that's just like summarizing everything and combining it. So, that's a a tip for that. And I'll add it. We also use multiple layers to synthesize and filter data in order to give more actionable information to agents. Today, we process over a 10 petabytes per day, and which is a lot, millions Sorry, billions of events every hour. So, like what LLMs get, if we would just run LLMs on all of that, we will be bankrupt tomorrow, basically. Uh so, we use multiple ways to, let's say, massage the data to make it actually useful and actionable for LLMs. All right, I have one last question here from Slido.

Uh shouldn't your report flag when an agent reads private data and then calls an external tool in the same session? Isn't that the real blind spot? Um, so the agent is not the one doing the calls to the data. Um, essentially, you are whatever API you're using. The the query stage is not is separate from the LLM stage. The LLM stage is just looking over the output of the queries and processing that. And also, I would say that almost every security tool you have is um reading some amount of data and processing it. So, I think it's a known issue in in security. Like you have to exclude your security tools from their own alerting in a way.

And I'll add that one more very popular insight is unauthorized AI usage in organization. You always find things there, always. Thank you so so much for coming. Hope this was insightful session. Really appreciate the time. Have an amazing besides and for those of you staying for RSA, enjoy RSA. And thank you again. Really appreciate it. All right, Ashish and Har, thank you very much for your time. We greatly appreciate it.

[ feedback ]