← All talks

Metrics that Matter: how to Choose Cloud Security KPIs For Your Business - Emma Fang

BSides Belfast30:20102 viewsPublished 2025-02Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
Abstract: As cloud security operations mature within the organizations, implementing effective metrics is vital for measuring cloud security posture and operational readiness. Organizations often face challenges in tracking security metrics without incurring resource overheads. This talk discuss examples of both potentially effective and ineffective metrics based on real-life experiences, tailored to various business scenarios and risk appetite. We will explores how to prioritize metrics that inform leadership and drive continuous improvement in cloud security posture. The session also introduces concepts like the Exploit Prediction Scoring System (EPSS) for prioritizing vulnerability remediation and Protection Level Agreements (PLAs) for building effective KPIs. The goal is to not only measure but enhance cloud security operations, empowering teams to identify cloud security metrics truly matter to their business. Speaker Bio: Emma Yuan Fang aka Emma Fang; Senior Cloud Security Architect, Senor Manager at EPAM Systems, CISSP Emma is a Senior Manager and Enterprise Security Architect at EPAM Systems, a global technology and digital transformation services provider. She is a seasoned cybersecurity professional with expertise in cloud security, DevSecOps and security architecture. In her current role, she designs and implements security solutions for cloud platforms and software development projects for her clients. Formerly at Microsoft, she delivered cybersecurity projects and technical workshops to a broad range of clients from emerging tech startups to established FTSE 100 firms. Alongside her professional work, Emma is dedicated in promoting a more diverse workforce in cybersecurity through mentorship and community programs. She is the Executive Lead of Women in Cybersecurity (WiCyS) UK&I, a member of the Industry Advisory Board and guest speaker for the Faulty of Computing at the University of Buckingham. #bsides #securitybsides #infosec #bsidesbelfast #belfast #bsidesbelfast24
Show transcript [en]

so hi everyone um thank you for taking the time to join me this afternoon um it's it's late afternoon and I know that um we had a lot of amazing talks today and I hope you enjoy the conference so far my talk isn't technical just to warn you on that but you know we are in the late afternoon so we can we can kind of walk through the metric a little bit more um so first of all who am I and why I'm here today I'm Emma um and I'm based in London I'm A Cloud security architect at eam and my role is a consultant um so I love design and architect everything in the cloud for my

clients so epam is a a technology consultancy with strong uh engineering Focus so we help our clients to accelerate and integrate security into the their digital transformation program uh outside of work I'm I'm also uh volunteering at the uh women in cyber security UK and I affiliate and I'm also volunteered at the University of Buckingham so today I'm going to talk about metrics uh we don't have that many audience here but I would like to ask this question so how many of you think that you have a welldefined metric program in your company that is interesting and that is quite expecting as well um yeah so that's why I'm here today um so there are three questions

I'm aiming to answer today uh why are metrics important in the conversation about Cloud security and uh what metrics matters to you and your business business and your security team and how can you establish effective and meaningful metrics for uh your Cloud security environment so this is the agenda for today uh we have quite a lot of information to go through I'm going to keep on keep an eye on the the time so yeah I hope we can finish within the given time um so at the end of the talk I'm going to talk about some fun fun thing about my project that I I've done in the past for my clients uh a couple

of case studies just pointing to you know the different techniques and uh uh best practices that we adopted to establish kpis so this is a definition of cloud security metrics it's very simple so a simple just a simple Google search that gives me this definition to be honest class security metrics don't overthink it it's actually literally everything in in m multiple domains instant response uh identity access management software development and so on you name it anything does within cyber security liit are Cloud related so what are the big metric problems so in my experience with a lot of customers so 90% of them have prioritized uh security metrics as part of their security program but they all struggle

to take on actions so why is that so these are the provins um firstly there is a disconnection between security and business goals and maybe the board might want to might not see the relevance or the team do not have the budget or the tools to track and collect and handle the metrics so um just a little bit explanation of what is a gaming metrix so Matrix that makes your security program look good you kind of program and manipulate them to make it look good for the board you probably have similar experience um I don't want to call something else but yeah you have you have that in your mind and um and sometimes it's just difficult to adapt

to the threat landscape as we um we have you know for example now we have the Gen risk and how can we adapt our metrics to different kind of techn Oles so it won't be outdated once we have facing the the new threat so these are all the you know potential security as potential metric problems so what are the benefits of tracking metrics and why do we track it um so this is not about just measuring success and security but more importantly met uh metric is a tool to communicate your security risk to your senior stakeholders and management so um for example you want to you want to drive improvements in your security program you want to measure the

effectiveness of your tools processes and your security team and you also want to track the security uh posture um over time so all these all these um elements are translated into one thing that is business risk reduction so we want all want to reduce business risk at the end of the day why do we secure our infrastructure why do we secure our clouds environment yeah the answer is that we want to ultimately reduce the business risk so before we dive deeper um I just going to clear up some confusions around metric and kpis so these two terms are used interchangeably but there are some key differences metrics are simply measurement snapshot of what's happening right now um think of them like you know

um thermal reading it tells you the current body temperature but it's not what it means for your health on the other hand kpis um are goal oriented which means they track your progress towards a Pacific Target so think of them like a finish truck telling you how close you are to your health status so for example uh meantime remediate is a value um that's is given by you know calculation and reduce that number by 25% within the next quarter is a kpi because you set the Target and it gives you something to aim for so that is the key differences between the two um there are a number of ways to design your metric program I won't be um you

know for the time constraint I won't I won't be providing details about them because each organizations different in terms of their process but this is the seventh step I would recommend to Define and follow in terms of establishing a metric program so how to identify metrics I've listed here some questions that you can throw out there to to get you started to think about you know um what who's going to be the uh target audience for your metric so if you are showing a metric for your ceso or Senior Management it would be different to showing the metrics to someone in the Cloud Ops and um there is some misconceptions about metric some people think that

metrics are just for ciso and CIO or you know Senior Management but actually security team needs to keep up with the metrics in order to achieve um achieve increase and improve security uh posture and also actionable outcome so you want metrics that don't just measure but also provide clear insight into where security can be improved we talk about adapt adaptability in terms of red lscape change and also uh your metric needs to have a context it tells a story about the facts that your um about your security posture and what to measure some of the key metrics criteria um you know that there are so many pro uh Frameworks that you can referr to I

won't go into detail of them but the two things I would like to um emphasize on is actionable and outcom driven so actionable why do we why is that important organization can identify areas where security efforts can for show so we can take Collective uh actions to address those root courses so metric needs to be actionable and outcome driven means that you know if if uh the metric can can be reflecting how effective the security policies are follow or enforced so we can have um an actual actionable outcome based on that metrix so now we have um so we now understand what is qualify as a key metrix then how can we identify the kpis

because kpis is another thing remember it's different to metric so I have two two approaches let me go through the um one of the approach is called top down approach so basically you you define the bigger picture of your business goals uh business drivers what do you want to achieve is it to increase your revenue or achieve or compliance with regulatory requirements so with the information you can Define kpi and then you can look at how to measure that so these are these are your tools and processes so you you get those metrics from the from your tools on on the other hand the bottom up approach is the other way around but this uh there's a thing

important about the bonut approach is it starts with collecting the data points or metrics from the tools and processes and then defin your and then work all the way up um to defining the the the the the metrics and the kpis and then at the end because sometimes you can't if if if you define the kpi at the first first instance then you you might not be able to um find the right tools to actually establish that kpi so that's why um sometimes poort approach is more appropriate so what which one's better so depending on uh the different organization and scenarios so um ideally I think it's combination of two um yeah so just a bit of um on the Casal

Loop diagram so I guess some of you might already heard about Cal Loop diagram but because it has been applied for uh different other cases um a casual Loop diagram in the metrix world can um highlight the positive and negative feedback loops so we can understand the Dynamics of a system and the relationship between metrics so by understanding that we go beyond just looking at individual metrics so we can in in uh explore how they interact and uh and influence each other so uh just a basic about the cul diagram so uh we have and the Matrix as a a single individual data points and then they are consist of set of notes which are variables and which are our

metrix and edges uh and with the arrows um telling the direction of the inference between different variables just like the one on the left hand side so um the positive sign indicates a direct relationship where an increase in one variable leads to an increase of in another variable and the minus sign uh shows an inverse relationship which is the other way around so for example um meantime to detect and meantime to charge and meantime to detect um is a and sorry the number of security instance at meantime to detect they are forming a loop here that's because um when we have too much security instance we have we take longer time to detect the security instance and

sometimes that relationship can be changed depending on the situation so by discussing those um relationship between those those metrics we can um identify the most important metrics for our system so this is an example that I I like to explain of how to use that um so on my on my left hand side I I have a a diagram which shows a a typical um infrastructure as code uh workflow so you have uh you have the infrastructure code template store in the GitHub and sometimes you scan the the codes and um every time when you spin up a uh like a resource in your environment um that gets to be evaluated and apply the the configuration will be

applied for your environment so so the there's a couple a few a few uh metrics that identifying from that workflow so this is so these are the um a few just a few metrics I don't want to um include everything because it can be quite big um but what I want to demonstrate here is the relationship between those metrics that I I have identified so the question is what are the ideal candidates for our key metrics so um look at the policy violation rate and the um the average time to remediate so as you can see they have multiple uh incoming relationship influence which influences uh by influence uh so they are influenced by a multiple uh

data relationship but they are not a direct influencer themselves so they and they are quite uh qu quantifiable and measurable so um they might be our key metrics and kpis so and by understanding the relationship between different metrics we can prioritize which ones are the most important to or critical to Monitor and improve so this this is how we um identifying the key Matrix in this instance by following the Cal Loop diagram and now uh I would like to explain a little bit about the protection level agreement so this is another approach that um like another technique that I've applied before with my client so um now we have identify our key metrics how about establishing a

Target so this approach production level agreement is actually researched and published by Gardner so it's a tool provides a way to translate security into the language of business so if you think about it like a service level agreement but an agreement of protection between the security team and the business stakeholder or leaders in your business so business decide the amount of budget they would like to invest into protecting certain Cloud environment and the security team can align their efforts with the level of protection so this is crucial to balance the this needs and the security requirements given the understanding of the your uh risk appetize so how does this work this is a um there is a five-step process so we

want to figure out uh what actually matters to the business and what are the risk appeti um level and then we look at what uh we've got in the cloud environment so here we we've done a like inventory of our Cloud environment and then we identify which one is most critical then we we discuss with the business and determine the the desire protection level for those uh critical assets and then we map the security controls to those with with the appropriate um protection level at the end of it we establish uh the pla the the protection level agreements using those um outcome driven metrics so this is what it puts into like a practice in

real life um so in this slides uh we use pla to prioritize remediation and effort so think about um time to patch your vulnerability so here we have um like a scope which is the different markets um like consumer in insurance and you know an internal scope and within those different scope we also have different uh systems we have critical system we have other internal systems and we Define pla which the N is is the number of days to patch based on the criticality of those systems so after a month we we observe um what actually come out from that uh implementing that uh protection level agreement then would realize the you know some of those um some some of those

um scope required uh more time to implement to require more time to more longer time to patch and some of them require less time to patch so here we now have the decision that available to our security team so we either can increase our budget we so we speak to the business leader and we we can increase our budget or we can adjust the protection level to a lower one by accepting some risk in the in the gap of the patching so this tells us you know what are the gaps between the current state and the Define or the agree the agree uh goals that we have discussed with our business leader so so this is a

conversation between the security team and The Business oh now um I'd like to kind of show you some example of our metrics that I identifyed for our my past project um I I call it this is the probably the fun part because we have go over all the best practices and the techniques so the first case study is um is a energy Supply uh operations company so um they have the requirements to be to have up time of uh of uh SLA requirements of uh 99% and the goal is to leverage clouds for operational efficiency and they are on their their their Journey to the cloud they have a small cyop team um but they don't have clear um defined roles

and responsibilities um most um the the Key Program here is they have over 70 metrics in the systems but they are struggle to identify the the kpis um after a conversation with my clients and I found out that they would like to present the kpis to their uh board like the the sen management and the cios so understanding that is the requirements for our metrics then we can work on how to prioritize ppis to meet that to reporting to the P to the board so I would like to show you some of the bad example of the uh kpis that um they have established initially um but doesn't quite work so for example the number of

instance of vulnerability and alert detected I know that a lot of you probably familiarized with that because we all track that in our systems right and then we present it to the C the ceso the CIO and they will ask what what does that mean what does the number mean it doesn't give me an action so it doesn't really give you in the context of how to and of um you know what is the context of the risk and how much investment the security is actually needed and sometimes we have false positive in our systems right in our security tools so by track simply tracking that will not be um something that's actionable um another example would be you know the

one ability metrics based on the CVSs uh score as we know the CVSs scho score are used for by a lot of one ability Management systems for their risk scoring by only tracking that it will not give a um a more realistic will of whether those those um vulnerabilities are actually being exploited in the wild so we we need a better tool than just simply tracking those metrics just uh you know apologies in advance for a lot of content in this page here but the idea is please don't copy this because these are only appli for my clients so for each of the companies we need a different story so um in so for example in this Pacific

case they would like to understand the security posture and the and the the the kpis are for the board so what we have suggested is to have the CSP uh cspm the the cloud security posture management school so if you if you are using Cloud Security Management School uh sorry Cloud security posture management tool you know that um we have a a score a security score that is called a cspn score so we can we can set a target of uh 80% and also we can lower that Target for a pafic projects for example some critical projects might require the high score and then and and for some not critical projects that requires um the

minimum standards and we establish the minimum standards of 60% um another good example would be to you know the courage we want to know that the what other product ction Cloud workload apply aligned to with the uh approved security Baseline so that is one of the you know an example of the actionable outcome so by tracking that uh we know that the amount of the cloud workload that is not aligned and we can we can identify what is the issues associated with that causing that um Gap so so these are the you know other metric that I have um we have included in the project to to track and to be honest the board only needs six on their

dashboard they don't need all 70 so that's why we want to identify and prioritize the the important kpis um this is the second example I just going to quickly go through this so this is a much more smaller much smaller company and they have experienced security issues in the past and they would like to establish a death side cops culture and Shifting the left on their application security so what we have find uh in this case is they have a lot of tools they have SAS D and backlog management tools so what we have suggested here is the the botton up approach because the fact that they have a lot of tools and they don't know why

they track the metrix so we we started by looking at those tools that they have and what kind of metric they have collected and then we use the Cal Loop diagram to establish the relationship between those metrics then we identify what other matric are actually influences by a lot of other matrics that can be something that's significance so at the end of it we we we established some metrics that FOC focus on their application security um focus on their um Cloud workload um because they um they they would like to understand how efficient their team is going to um is is fixing the the issues for the scans so that's why we have tracked um for example the production

code passed automatic security and quality test so that is something that they they would like to track another another thing about it is those metrics are not for the board so in this case they would like to show them the metrics to their developers and the the cloud op team so that's why we focus on getting those um those compliance scores um in place as well and sometimes even the the um the the the audience change in in this case they can still be useful for those boor people so yeah so this is the just an example of the of the potential metric that we could establish so what is the uh c epss um so

as before we have mentioned about CSS might have a gap so the epss is a a probability score between zero and one and represents the likelihood of exploitation so the coverage is a percentage depending on the your risk tolerance and resource constraint so there's a balance between the efficiency and the courage that you would like to um like like to um have for your um for your vulnerability management so the higher the epss uh threshold the high the efficiency and the lower the the coverage so that's the kind of the tradeoff so this is there there's a um just an example of uh the epss that I have used it to um in the patching

example so we use the epss to further um improve our kpi so we only track for some you know for some company that they they might have a lower um they might have a higher security tolerance level we can uh Define the epss score of 70% so we only track those ones that actually have a higher score however just bear in mind epss also have a drawback because you have to constantly Trace because the threats can be changed over time so you have to constantly getting the new feeds from the UPS system into your management vanability management uh tools so the the advice is to automate that process okay so we are nearing the the

end of the talk um it's on my page here is just a a set of you know a a some suggestion of the tools where you can collect your metrics is pretty much everything in the cloud as you can see it and um these are the some of the examples we have U either recommended or I've just simply putting it here for the prettiness of the slid and um yeah so those are the metric so in terms of you know tracking your metrics you also need to have visualization tools these are the visualization tools that would be quite good for to start with okay so here's our key takeaways um just a few tips to get you started on the

metric program so firstly kpi is not just for security teams they are the business decisions so align your metrics with your organization specific goals and risk tolerance is very important and there's no one-size fit or solution um so you need to come customize your metrics to your uh to your unique environment and needs um you the protection level agreement to shift responsibility for security to your organization's business decision maker so your metrics should also tell a story about your security posture and the progress over time and finally don't get lost in the number uh we want to look for Trends and uh patterns and the actionable insights from that metrix and starts with something uh small and grow to something

bigger so otherwise you get too overwhelming so that is the that's my talk um I guess there's no time for questions but um yeah welcome to um ask me in person if you like sorry have a couple minutes