
Awesome. Hi everyone. I'm Adam. I'm a security consultant beside the CX and X software dev and I'm also a for graduate from University of Plymouth. Today I'm going to talk about scripts and streamline our processes. Um I'm going to be focusing on scripts I wrote to save me going ming benchmarks for um operating systems. I just left my role as a software engineer and I was dipping my toe into professional pentesting for the first time. I drove 4 hours to go meet the team and get up to speed. My first task was supposed to be straightforward as just a benchmark of operating systems. A couple a couple turned into seven. And I realized that doing this manually,
sifting through the outputs, checking baselines and formatting was going to take me days. At one point I even had hot keys set up on my numpad just so I could apply the formatting properly. Um, and even then it took me a while. So, yeah, I was just becoming ridiculous. Um, coding's always been something I'm passionate about. For me, it's just really digging your teeth into a problem and finding ways to automate it and solve it. Um, and that's also what drew me to pentesting is just out of that that problem solving mindset. Um, so I saw in that from the outputs we got something that could be automated. My first approach was just to take the
HTML that came out, try and pass it like guess how failing client is. From that it was it was fine. It wasn't client ready. It didn't look very good, but it did give you a list of everything that failed. Um, from there I went down the rabbit hole. I was merging formats, fixing mismatches, and debugging till 4:00 a.m. Um, as a result, I was actually quite happy with, as you'll see. Um, it took what would have been days of work and compressed it into seconds. and more importantly it was something polished and repeatable. [Music] So what are we actually going to cover? Um yeah so I want to break it down into what uh what are the benchmarks the
actual logic um of it um how we solve those problems and what the challenges were here. Um we'll be actually go through actual run through of the demo. So it'll be what we uh what we'll be looking at beforehand. Um how does it run? And then we get to see the useful app that I slave so much on afterwards. Um I'm looking forward. So it's not it's definitely not perfect. I think this is the thing with scripting is like there's always there's always changes, especially if you want to try and bring it to a larger audience. Um but yeah, there are there are bits in there that's reusable. There are bits worth improving. And I think there's always
there's always next steps to look forward to. So what's what's actually the problem? Um so this is as you can see here this is this is part of an output. Um they're they're really good but they are they are a lot to see through. I ran a benchmark on my Windows 11 machine Thursday night and they gave me a 300page um HTML output that I I had the joy of running through um and that is it's a lot. It's really good but again it's not super consistent. Um yes, you can get um benchmark outputs in different formats like uh HTML and spreadsheets and JSON. Um and each of them by themselves um are actually pretty usable and you can do a lot with
them. But when you start trying to compare them to each other, you'll notice internal inconsistencies which make trying to action them programmatically um a lot harder as we'll see later. And that's where a lot of the challenges came up with this. Um, so yeah, we could have we could have done this if we wanted to do it manually just using the HTML. Um, but it's it's going to take ages and we're going to be formatting for days and we're not we're not going to bring any sort of value to it. So I much rather try and find a way that we can automate it and spend that time actually trying to provide insights into what we provided.
So when I got my sort of teeth stuck into it, we um ditched the HTML and started on the XML and then we kept on dragging in more files because no single file was perfect. There was always inconsistencies. Um the XML didn't have padding that was needed when we wanted to get it into a PDF. Um the PDFs that we used on the input. Um they didn't include indentation and code blocks. Um, and there was just so many more sort of minor minor niche issues that kept on meaning. I had to add as a whole array of files. What really should have been just one problem. Um, when we got into it early on, we're
using HTML and XML. Um, now this was great, but they were massive files, especially for that 300page um, word out that I mentioned earlier. And that meant just trying to generate a generate an output for it to take 15 to 30 minutes. Um so it was not only a long debug process in terms of the issues but also meant that um optimization was prioritized just so we could try and get the outputs faster and sort of realize what we need to improve. Um as we'll see later there was um sort of different formats we needed to stitch together. Um so just just trying to find the right place for a code box to sit in
the PDF took me a lot longer than I ever hoped it would. [Music] >> Oh yeah. So, let's let's check out the um sort of some of the key points in the script. Um I ran you through some of the unique challenges and what we're trying to solve there. Um yeah, [Music] cool. So, this is this is a five stages function. Um so, the PDFs were really cool, but one of the issues with them is we didn't have bookmarks. So that meant uh whenever the benchmark was break down into different sections, we would try and find like different uh sort of section and we can do this by bookmarks. So what we did is we passed it just a
number of the heading title and then we go through it and search it in every piece of text on the PDF until we found out where the section started. And then we do this based on the text size and we see if there's any text the same size or larger all the way through to the end. And that way we can know when it ends as well. And from that it gives us power to find out the different sections for it and try and pull all the data we need from this cool get finding properties. This is um this is sort of the fruit fruit of our labors almost. It's where we bring it all together and we take we take the
output from different sections. We've got two HTML and we've got some other bits in there. Um and so this is where we pull all the different file formats together to make a sort of single tangible source of trades that we can work with. Um yeah, so this is this is what we used to make the PDF and this is really where everything lies. Then two HTML um this was part of an really interesting challenge. Um so what we had is um so there was evidence in these documents um sort of showing report outputs and telling you what files they checked. But when it tells you what files it checked, it would run through every file in the
file system, which meant these could be thousands of lines long. Um, and again, it's not very PDF friendly. So, what we had to do was where this was the case is try and um tell them how they could check this themselves. So, they provide sort of these these spreadsheets and these PDFs on how to check it yourself and the actions you need to do. So, what I've done here is I've tried just pulling that information from here. Um, and sorry, [Music] it's all been written in markdown. Um, so it's like things like asterisks. Um, and we've got the triple apostrophe up there instead of what we want, which is just the HTML formatting. Um, so this
this challenge was really just trying to sort of make it make it cohesive and make it consistent so we can tie it in with everything else. [Music] Um, yeah. So code box um as I as I mentioned earlier we're pulling data from the PDF and we're pulling data from the spreadsheet. Um PDF was really cool. I thought great we'll use this for all the evidence um all the uh reproduction steps. Life is good. Um but then I quickly found out that whenever we had code blocks there was no new line characters. So that meant I could either try and figure out a way that we put new line characters into it and that's just by guesswork and you look at the code.
Um so yeah it's not it's not super sort of feasible. So I had to do in the end was find every point in PDF where we include code code blocks um and then pull them from the spreadsheet to HTML function that we just saw um and literally just sort of splice them together. So we've got data from two different sources sort of inter intermingling almost. [Music] >> Yeah. Cool. So that's let's have a look at some like the actual um what's what's the inputs? um how's it run and what what can we get on the um what can we get in the output? Cool. Yeah. So, this is this is sort of thing we're working with. Um so, you can
see here this is yeah, this is like page two of 300 right here. Um and it just it just goes on and on. There's a lot of failing. There's a lot of passing. Um and there's a lot of checks you have to do. Um and this is the sort of thing we get out of it. Yeah. [Music] So for this example, I've given it two different two different uh hosts to work with. So it's trying to run through it's checking both the benchmark files and it's tying them together and putting them into a single PDF that we've got here. So um and that that would have been that would have been about a day's worth of
work at least um that we we just been able to save and we could put that time elsewhere into actually sort of study the information instead of just trying to format it. Um but this is this is what we get out. So the branding is consistent with the actual sort of with with Cyber CX and our our documents. Um we've got a table of contents. Um as you can see it's page one 207. Uh, so it doesn't make it massively shorter, but it makes it a lot easier to work with and it's a lot easier to read through as well. Um, and as we can see here, this is sort of the evidence section I mentioned here. Um,
yeah, and just sort of the formatting is consistent and we've been able to sort of produce something sort of more client friendly. So, what do we reuse? All of those HTML formatting helpers. Um, they've just been really useful in other projects, especially when I try want to try and get things more consistent. Um, the CLI argument passing. Um, so we can actually pass arguments to it. It's a bit of a nightmare because again we've got four different five different files for each one. Um, but you can do it and I'm hopefully going to improve that in the future. Um, and then we've got the PDF pass and patterns and branded templates because those have been really
useful as well. Um, and there's there's always a point of sort of trying to try to keep those future. >> Yeah. So there's there's still loads of things I want to improve. um improve error handling because of the ch uh sort of challenges and issues I've raised here. Um there are always minor inconsistencies. Um and Thursday night when I was trying to make the demo um I was having one of those one of those minor errors. Uh so that's always a pleasure. Um upgrade CLI. So yeah, I think that would be really cool cuz right now we have to sort of get the user to pass it like five different files. Um and it's not it's not
intuitive. It's not super cohesive but it's doable. Auto indexing bookmarks. I love to have it here sort of um built into PDF so you can you can run through you can get all those bookmarks just like the issue I mentioned with the um with the inputs um not smashing the spreadsheet names that is the biggest issue I've been having with this it's just um spreadsheets don't play ball between the documents you've been given and it's it's again it's one of the sort of minor inconsistencies which has made this just such a challenge um yeah so if you're if you're looking sort of trying to take this little forwards and try to like um do something
with this. I've got my Git up there. It's not live right now cuz I've been naughty, but GitHub um yeah codes on that I've shown today. Automate the boring stuff. If you haven't read this, it's an excellent book especially when you're trying to deal with challenges like this. Um it's got to use all of digging into word documents, digging into websites, trying to automate daily tasks and that is at like the core of what what what Messenger really wants to try and get across today. Um, so if you're going to check out anything, just check that out. Announcement of code. Um, I think this is this is personal favorite of mine. Um, it's not it's not
um sort of core programming and technicality, but is sort of problem solving for that. I really like it. [Music] Um, yeah. So, this starts as quick and turned on something I was able to rely on all the time. Um, it saved my managing consultant Chris um multiple times from just tearing his hair out. Um, and he's I know I know he likes me for it. And I've also had the pleasure of it being able to be used internationally as well. So, that's been a real privilege. Um, yeah, and it's just it's saved us saved us hours hours and hours um across across the company now. That's been really cool. Um, so yeah, I could have I could have fixed
the problem in a day. So, naturally, I spent I spent a week just trying to automate it. Um, yeah, and it just sort of helped me realize how much um how much repetitive work we just tolerate cuz that's that's how it's always been done. Um but no once I once I had the first version working it really um it really led to something more and that was really cool. Um so yeah even if it doesn't save time up front it can always be worth just trying to invest invest time into sort of shortening and streamlining these processes especially where it improves consistency as well because next time we get one of these it's going to be the same and they're
not going to worry about where their baselines are and how they've changed. Yeah. So thank you very much for your time. Happy to dig into anything or any questions anyone might have. Yeah. Thank you.
>> Have you considered using something like latte or um types as a new one um to format the PDFs? >> Um I wouldn't know. So what what I've done here is I've I've used a tool that converts HTML to PDF um because my software engineering background that was just hands up for me. Um, so it's it's all it's all HTML behind the scenes. We've got headers and rest in the HTML as well. No, that's not really. >> Yeah. What's your false positive main report? >> Um, so if there's a false positive for the main report, um there's one source of truth for um whether findings fail or pass. So when you're sort of going through in your science, you checking it
there, you can change it there. Um so this this is all um you can change what goes into it. So, it's very much you are you are still in control of those processes and things. Um, it's just it's a it's a way of purely just saving time where you are you're going to be yeah staring at words all day. Awesome. >> Thank you very much.