Lost in Translation? Making Pentest Reports Speak the Client's Language

Name: Lost in Translation? Making Pentest Reports Speak the Client's Language
Uploaded: 2025-10-12
Duration: 36 min 54 s
Description: Penetration testing reports are often the only tangible deliverable clients receive, yet research shows widespread usability gaps that prevent organizations from acting on findings. This talk presents an empirical study examining how clients understand, prioritize, and implement pentesting recommend

BSides Tallinn · 202536:54131 viewsPublished 2025-10Watch on YouTube ↗

Speakers

Maria P Murumaa Katarina Galanska

Tags

CategoryResearch

ResearchCase Studies and Incidents Analysis Empirical Research

StyleTalk

About this talk

Penetration testing reports are often the only tangible deliverable clients receive, yet research shows widespread usability gaps that prevent organizations from acting on findings. This talk presents an empirical study examining how clients understand, prioritize, and implement pentesting recommendations, comparing human-written versus AI-generated security advice and exploring how structural elements like role tagging improve actionability.

Show original YouTube description

Lost in Translation? Making Pentest Reports Speak the Client’s Language Link to slides: https://docs.google.com/presentation/d/1IHq1-923U5SzDZSffDQhvI9QtaG7mAYtZX6Lruh4Uoc/edit?usp=sharing Penetration testing reports play a significant role in helping organizations identify and mitigate security vulnerabilities as they are the only tangible product of the conducted tests. The report effectiveness relies on the extent to which customers can translate the findings into actionable decisions. Our study investigated the usability gaps in penetration testing reports from a customer-centric perspective, focusing on the challenges organizations face in understanding, prioritizing, and acting on the provided insights. Want to know how to improve Your reports? Join us and find out! Sneak peak „From Reports to Actions: Bridging the Customer Usability Gap in Penetration Testing” K. Galanska, A. Kruzikova, M. P. Murumaa, V. Matyas, M. Just; IEEE Access, vol. 13, pp. 73975-73986, 15.04.2025, 10.1109/ACCESS.2025.3561220

Show transcript [en]

The first talk on stage two is about uh pentest reports. Um please queue in the intro.

[Music]

So, hi. Um, I invite you to uh join us in this little mental exercise. So, imagine you just conducted an epic pendest assignment. I mean, the findings were great. You put that extra detail into writing the report and you feel proud of your work. I think it's your best report so far. Now comes the time where you hand over the report to the client. Usually the PDF file or a doc file is the only tangible product that the client gets and uh that shows your work that you did in the project. So it is extra important that everything that you mark down is understandable so the client can implement these recommendations into actionable decisions. So you go over the

report, you introduce the findings and suggest some recommendations on what to do next. The client nods, they understand that yeah, maybe the situation is not the best. But then comes silence. The client actually doesn't understand what to do next. So they might not know who to give these recommendations to implement to. Or they might think that those are a bunch of low-level issues anyway. So why should we even try to solve those? Simply because the client is lost in translations. these security vulnerabilities will got not fixed. Today we are going to help you tackle that issue. We did research on how to improve the usability of pendest reports. Uh and we focused on the client

side perspective. Our research is part of the chess project which aims to take both industry industry specialists and academic researchers uh to bring them together and to uh solve real life cyber security issues. So hi I am Maria I work as a cyber security engineer in cybernetica and my main focus is on penetration testing and analyzing complex security sensitive systems and today with me is Katina Hello. Um, my name is Katarina. I am a researcher from Masar University, Berno in Czecha. It is my first time talking to people with headsets. So, it's amazing experience. Um, I kind of love the combination of uh cyber security and human element. That's why I study u usable security. Um I am

not only academic, I'm also working in industry and I used to be more on the offensive side. I used to work as a ethical hacker. Now I'm more like protecting and defending. Uh but when I was uh pentester I often experienced a situation where uh we handed out a report to the customer and it came back with the request for further explanation how we should interpret the results what we should really do to to mitigate it and it kind of u came to my mind like it is interesting topic to study do anybody give some focus there so I try to look at different resources is like is anybody doing anything on usability of pentest reports and I found out that

there are standards and directives saying that do a pentest but uh nobody says like how to do it. So it uh kind of open uh an area for research.

Uh so we decided to do like a introductory study and how to do it in area that was not explored before. uh we wanted to get the opinions of the people that are the readers, the consumers of the report. Uh so we gathered like professionals from uh industry and from Czecha and also Estonia. Um and we wanted to ask like where are the pain points. Um so to do it in a little bit different way we know that what they usually see is the report itself but they maybe do not know exactly how the process of pentesting is done. Of course we cannot show them like one week of full pentest but we decided that let's do like a

demonstration that will be like selected findings some exploitation and such. So we did like a demo application and we started with some reconnaissance. We intercepted the HTTPS traffic and like started to find different vulnerabilities and then we exploited them. Uh then what we did we gave them like a report but not of course the 50 pages long report. We gave them like one specific finding that was exploited in front of them before and we asked like is this what you would expect? is this uh something you would know how to act on and we wanted them to rate the the report also the structure. So we used like a survey to uh to get their opinions but uh we thought like

okay a survey we know how to rate this specific finding but this is not enough. So we wanted to add uh something more and we wanted to talk to people actually and really discuss like where are the problems in their opinion. uh so we did focus groups you can imagine it as like talking to people around the table I don't know seven eight professionals and there is a moderator with a structured uh discussion and the moderator aims to not influence the others but really just gather uh their opinions. So it was really interesting to see different perceptions because we had uh cyber security managers that are like decision makers. We had developers that are like

the ones writing the code and uh what they find important is different. So uh today we are not going to deep dive into everything that we found out but we are going to focus on things that you can implement today to improve your bendest reports. But first let's start from the positives. We found out that the report structure is actually really useful for the client. Although it is not standardized, the usual flow of having a executive summary, a methodology, a scope and so on is really useful for both managerial and technical roles. The managers focus on the description on how this pen test was conducted, but the more technical roles only look at the issue

descriptions and step-by-step um uh step-by-step u uh ways to recreate it. And also, of course, um look at the recommendations. And one positive thing that clients really highlighted is that the reports boost their confidence. they have concrete evidence that their security uh posture is one way and if they implement some recommendations they can boost their security maturity. So they were really uh positive and happy that penetration testing is part of their workflow. Today uh we are going to focus on two types of improvement areas and these two were the topics that actually had the most suggestions on how to improve on. So, we're going to tackle format and recommendations. Regarding formatting, one thing that clients really lack is uh the positive

findings. So, pendas reports tend to be highly negative. They only state the issues that you found, but they don't clearly state what was actually done well. So when we have a section that goes over like functionality or parts of the applications that uh wasn't um maybe vulnerable or was implemented quite nicely, the client gets like a pat on their back that job well done. Another thing is that the positive findings actually help to prove to management that the investments are made uh well in these areas. So investing in security pays off in the long run. Usually the bendest report itself is a PDF file or doc file. While it's quite useful for management to go over to look

at the visuals and to get get the feeling on how things are, it is not really useful for the technical people. Usually the workflow includes one person who takes the PDF file, copy and paste all the recommendation ideas and the uh issue descriptions into a ticketing system. So there is a lot of manual labor. So if we could include now some machine readability uh alongside with the PDF files that would be really useful and uh improve efficiency. For example, providing a JSON file or a CSV file or even like a XML file could allow them to import these findings and uh steps to take into their ticketing systems like for example Jira or Confluence and then uh be transported uh

to the backlog and help their division and uh prioritization. Also one finding uh that many different participants um agreed on was that the uh advice given should depend on the size of the company. We had the experience of uh interviewing both smaller and mediumsiz company representatives as well as larger larger organization representatives. The um smaller or medium companies often don't have like a dedicated security person or a team. So usually the developer or engineer has to take these recommendations and do something with them. So is this easy understandable is uh really key also when um when it when in need report it is clearly highlighted what should be prioritized and what should be tackled when time is also uh helpful for

them to assign uh the prioritization in the long run. But different um companies have different types of teams and departments. So this custommade tailor approach uh is not necessary for larger corporations because their security team is going to tackle it uh by themselves.

And now we are to hear about recommendations. Um when I was a pentester I once witnessed a project where it was a little bit eye openening for me uh where we had a website and there was like /admin after the domain admin interface of accessible. So of course we reported this and uh how it usually goes they take some time for like repairing it mitigating it and then they ask for a retest and uh then we of course retested it. We tried like /admin it was not accessible but uh a curious pentester would try like /admin because he knows some URL standard and of course uh we tried it and it was uh accessible because they mitigated they they had

some security control in place but it was only for/admin. Uh so then the process went again. So we reported this and uh then they asked for a retest once more and we already knew what we are going to test. We tried three slashes and we accessed the admin interface again. Um so at that time I started to think like we must be doing something wrong. Either we are giving very bad security advice or they are doing very bad job at at um securing the application. And so that was one of the moments when I started to think okay security controls are important but uh they must be implemented correctly and also like from a pentester perspective uh we

should really focus on writing the recommendation in the best way possible. Of course there are some uh uh restrictions in the meaning like it's not like white box thing. we do not see everything but we should do uh the best to explain what is going on and what to do. Um back to the study uh during the focus groups we discussed that uh is even a thing that a pentester should give the advice like how to fix it isn't the pentester's job to attack it. uh but in the end we uh came to the conclusion that uh it is like a first step for the company for the person that is mitigating it to see something how to

mitigate it to have some advice. So it's good to have something. So luckily we can still do the research and continue um and they suggested like some ways how to improve how the recommendations is is given. uh one of the improvements is to provide multiple mitigations. Uh that is in the case uh when you can imagine again pander doesn't know everything uh he or she g gives an advice and maybe it's not like uh in the budget maybe it's not like time timewise possible so they wanted to have multiple versions of what can they do to mitigate it not just one thing. uh they wanted to know if they are applying some mitigation, what impact does it have? Uh is it going

to uh reduced uh system performance? Is it going to require some downtime? Uh are there any other uh restrictions they should know? So, uh I would say it was more for like decision makers and managers. uh because technical people just wanted to like for them the ideal would be like step by step think what should I do exactly but when uh tech when a managerial person reads it uh probably what they need to know like is it going to affect the business somehow uh when we were talking about multiple uh mitigations uh there was a need uh for preferred one so there could be a fix that is like I don't want to say hot

fix but like short-term fix there could be uh a fix that is like an ideal one and they wanted to know which one is which so they can then discuss and apply what is uh the best for them and as we know the report is a report for multiple readers and let's say multiple target groups it's not a document that is read just by one person it can if the company is very small but usually uh it goes through like managers to really uh someone someone coding there um uh in the like first floor. Uh so they would like to know if they are reading something for who did we write it? Did we write it for the technical

person or did we write it for uh for the developer or for the one responsible for configuration files? So they wanted something like if we give like recommendation this is what you should do. We could add a tag saying like this is dev issue and or this is like config issue. So they would know when they are processing the report like to which department to which person to to send it out or to assign the Jira ticket. So uh that was from our like selected results from our introductory study. Uh it was interesting that when we let the people rate like the the finding that they got uh they rated the recommendation part as the worst uh with

respect to clarity. So that's why we decided to focus more on security advice like what we are advising uh and uh we choose some selected uh features that we wanted to test but also we knew it's 2025 and AI is like rising and everything so what would it be to not include AI in our research so we wanted to also know if we are replaceable as as uh pentesters in some way at least like partially actually. So, we wanted to try like human versus AI recommendations. Um, and we added uh two selected features. Uh, so we had five versions of a recommendation for one finding. Uh, and only one was like written by a pentester. The rest of them was like AI

based but they differed in like level of detail whether it was like very short one or it was like more like multiple mitigations were uh mentioned and more like stepby-step thing and also uh we wanted to see if the tagging affects uh how people perceive it. So we try to take uh two of them with like responsible role is backend developer and wanted to know like how it affects people. Um so we created like five versions and then tried to gather participants and gave each person like uh randomly two of them. Uh and sample wise we were a little bit better than the first study. Uh we started with 200 cyber security students. uh it would it was interesting I would

say like if we would want to differ from IT professionals that uh the confidence of the students let's say they are like junior cyber security people uh affects how they perceive things and how they decide what they would apply what they would act on usually it was like the easy to understand thing or uh if it would be explained in more details they would still choose something that is rather easy than secure uh but we also got uh feedback from 200 uh more than 200 actually IT professionals uh directly from the companies but mostly we visited defcon at Berno from Redhead organized by Redhead and we had a booth there with Maria. We were three days like talking

to people. It was very much fun. Uh and we managed to get their like opinions. uh we are still like in the process uh of uh analyzing the data. So this is like a sneak peek like what we are doing right now. Uh but we already can see some patterns and one of that is that we kind of wanted to know how clear it is and how actionable it is and it's not the same. Uh I would say that what reads well isn't always what enables action. So something is clear but doesn't mean it's it's uh something they could do the mitigation based on. And another thing is uh answering the question if we are uh we

if we are replaceable uh and the thing is that we had like two different findings. One was like lower severity and one was like medium severity. For the one that had like lower severity, it was like information leakage from API endpoint where like the basic recommendation was like masked with asterisks like more further uh recommendation was like do some access control etc. Uh but it was rather easy to understand also for uh for the students also for the professionals. uh but the other one was like uh uh capture cross-ite scripting and it was um I would say more uh complex one so it was less understandable for people and when they were reading the recommendation

it was more hard to understand so the result of that is that if it was easy uh finding uh AI was super good for that for the recommendation but when the finding was a little bit more complex they preferred actually the human written recommendation. So, uh sample wise we do not have like such big sample but from what we have uh this is what we can see uh which which is interesting.

So today we only um did like a sneak peek of uh the results of our first paper. So if you scan the QR code right here uh then you can access the paper and see everything that we did the methodology and all the findings. I promise you it's worth a read. And for the second study uh we are still developing it. So if you want to be the first one who knows if AI will replace us all and how AI could be integrated into your workflow then definitely write us a message. Um our email addresses are in the next slide. here. So, uh when we will publish the uh slide when we will publish our research

then you will be the first one to know. But yeah, thank you for coming to our talk and we are happy to answer any questions that you may have. [Applause] Any questions? We will pass the microphone. Yes. And uh please put on your headphones so you can hear the >> headset. All right. >> Hello. >> Okay. I hear also myself. I should turn this off. >> Yeah. >> Hello. Hi. >> Yeah. >> Um thank you very much for the research. Actually, I needed to hear something like this. uh what I've seen from the many reports actually pentesters don't really evaluate the business perspective of the client scope for example um multiple sessions allowed it may be a really

problem for a banking app but actually it's not a problem for a social media because users actually use it on the mobile app and the website so they have to allow multiple sessions so they report the the same vulnerability for different apps where the clients actually sometimes take them very seriously and okay we have to fix even the low ones like the technical team wants to be seen uh pretty to the management board so all the vulnerabilities in the report should be fixed in in that scenario for example we um come into a conflict with the client because they said okay we cannot do this for example but the pentest already wrote it into the report so we can't

also remove it or like it doesn't make sense tends to remove it. So, uh did did you do any research about this or like do you have any data about this like uh how it should be or any I don't know any complaints from the people that you have talked so far. >> Um I can uh do like a little intro for that. What what was it? >> Um basically for this I think the multiple mitigation measures so multiple recommendations will be actually suitable as well as including tags to uh who this issue should go to. So many of these um recommendations should firstly go to uh management to decide on the business flow changes. So if these

multiple sessions should be actually allowed or not allowed. So I if if uh penetration testing reports could have these little hints that before you implement this uh maybe talk with your product owner or or with your management on is this actually a desired outcome. This would really help uh and limit that not so useful um recommendations get implemented. If I understood the question correctly, I think it would it is the thing of the rules of engagement and scope definition. Uh we had many people complaining about it. It was mostly managerial people. Uh I would say it's more like process problem. We did not uh research wise we did not focus on this but we included it in the

paper that uh the opinions and the results of what people said. But I would say it's complex problem. Uh but yes it happened many times. People complained that they even explained to the pentester that the scope is this this is the constraints. This is where to focus and the pentester did like a basic thing like it would be any other application. So yes yes it happens. I mean uh the pentest as such has its limitations. I think the pentester cannot understand everything but it could be some scope-wise uh limited and I think there will for now they will always be important the internal analysis of of the report because for example uh when I work with reports

sometimes pentester reports something and in many cases I can see okay it's not applicable it's not applicable it doesn't u like affect affect it or this is not what we care because we have some other measures to mitigate it So >> yeah um I mainly ask actually for example excss in a static application do you report this that's that's maybe more straightforward to ask it it's a static application there is no functionality in it it's just uh hosted on a GitHub pages and you found an excss do you report it that's the main mentality about the question actually >> I mean I have experience that it was reported as like informative because when you are reporting something you

have to like uh say that this is the risk and If there is no risk then maybe you can put it as like informative finding but otherwise like you should not be reported as like high critical thing. >> Yeah I I've seen a lot of times it was report as high because it's a generic uh issue like XSS is lowest medium so it should be medium or high uh without actually looking at the application functionality. >> Yes we did not have those cases. >> Yeah. um we uh focus more on um the usability part. So if they already have some results, what can they do with them? But this is definitely a good topic that we could uh explore maybe in

our third research project. So thank you. >> Yeah, what thank you. What I've seen is uh like clients are also suffering from this. Maybe uh it could be added in the research. Thank you. >> Thank you. Thank you. Let's

>> Hey, testing. Okay, you can hear me. >> Okay, great. Thanks. Uh really, really interesting report and I love the customer service approach that you're taking, right? Like um it seems to me and maybe your future publication will talk about this in which case just feel free to tell me wait and read that. But it seems to me that the the real opportunity of like adding all the detail of who should be concerned and approaches they might take and timelines and stuff like that that's that's the opportunity to really keep the human in the pentesting, right? Like that's the value that that we add. Can you speak to that a little bit? Yeah, the one of the motivations uh on

our current research on how we could integrate AI is to limit these tedious tasks that don't need to so much human interaction and can be like automated and then we can more focus on either really customizing the pendest that we are doing as well as writing quality support rails for the client. So helping with prioritization, helping with understanding the impact to it. So this is one of our goals of the new paper. So uh this is just like a sneak peek to get you excited.

Yeah, I have non-technical question and the nontechnical question is that um that um I have been um witnessed uh several pentesting presentations to the sea level management. Um so that's a company is doing the pentest. Um typically there is a sea level management presentation and typically there is also um CISO the chief information security officer in the in the in the room. typically is that CISO is using the chance and taking to the management that oh that I have told you for years please give me more money and all that maybe some 5% of the cases what I have witnessed that CISO goes to defense mode tells oh it's what you have done it's absolutely not true

you have done it wrongly I it's it's it's crap how to how to how to how to address this situation Can you It's more I know it's not technical, it's more psychological, but yeah, I mean I think it's a bit dependent on the client. Yeah. If they see this pentest and as a opportunity to show management why further investments into security are really necessary, then we as pentesters can be your voice and help you to get your message across. However, if the um CISO or or managerial roles are more defensive and they need this bendest to be a success success to prove that um they are BCIDSS compliant for example, then we get into a tricky

territory. So I suggest to bend testers to show that retests are possible and we are here to help you fix these issues. So for example, if you uh solve these moderate level issues and try them to get down to low, then we can update the report and you will have like a better feeling and then you can also show that see I as a manager managed to get these moderate level issues fixed or get down to lowlevel or info level ones. So this might be a good approach to follow. >> Hello. Thank you. Um so um do I understand correctly that one of the outcomes of this paper is that um uh you advocate that or at least based on the

results that all pen testing should have a non-technical element because I if I understood correctly that one of the things that the customers really wanted was um a de uh sort of something that uh that the report is uh usable for their specific company. So that would require on the pentesting team somebody to look into the business side of things, how uh things are structured in the company to give this advice. So do I understand correctly that um this is one of your outcomes and then um uh do you think that um uh this is something that uh should happen? This is sort of your personal takes on this that uh all pentesting teams should have a strong

non-technical element involved. Thank you. I would say that um even though it's not defined how the pentest should look like like the structure there are best practices and usually there is a executive summary that is usually meant for like managerial people uh and people mention in our focus groups that uh this is what they like go for when they start reading it because it's written as you said nontechnically more about like there are high findings and uh this is the risk and this can if it's exploited. Uh but it does not go like into too much detail about their like business strategy logic and uh their risks level because they don't know the risk management and and things about the

company. Uh however, as I said, it is being there and they want to keep it managerial people because it is what they read. So I mean I take it that your answer then to my question is no because what I gathered from your presentation is that the interest is also like more detailed to my company advice but what you now said is a general executive summary of the of the technical findings. Um the methodology and uh and maybe also the executive summary and the positive findings can be tailored to the client and uh clients um actually brought out in the focus groups that they can spot if you just run a vulnerability scan copy paste a output

and they really want to have this customtailored approach. uh as one of the slides uh pointed out that smaller companies really need help in uh this business business decisions and integrating it. Larger ones maybe not so much.

>> Last question. >> Uh I'm just curious about the uh AI reporting aspect. um how did you choose the model? Did you do any fine-tuning or retraining >> etc. >> Describe this process? >> Um to answer that uh we actually considered like training special AI uh or doing something ours but in the end uh we considered the repro reproducibility and it's research so we want it to be easily reproducible. uh and we did kind of test what it outputs like the known like GPT model. Uh in the end we ended up with using GPT uh because our findings were like uh from web application and I would say there were rather like there are many

materials about it and for that model it was easy to generate I would say quality recommendation. So we did not feel the need to use anything else and research wise it was better to use something that can be reproduced >> and uh we wanted to focus on how to format the input that you give to the model and see and we play played around with it to see if uh the format that you give um changes the result that you get. So we played around with this as well. Just um one comment because I've been playing around with this as well and I just found that even the most common like chat GPT for example when it when

you try to go in I guess this aligns with what you found as well but when you get into the more technical explanations it does still hallucinate quite a bit. That's just what I've experienced. Thank you. >> Thank you. We are aware. >> Yeah. Yeah. Thank you everybody for coming to our talk and having this awesome discussion with us. Thank you.

Lost in Translation? Making Pentest Reports Speak the Client's Language

Related talks