
My name is Ryan like Ryan and it is my great pleasure to uh extend a very warm welcome on behalf of besides Oslo uh the the volunteers and organizers and uh all of uh all of our great sponsors. Thank you to the gold sponsors who have uh not only funded us to make this happen and uh uh also are downstairs participating. So uh shout out to Defendable and Nemonic returning sponsors and uh Promon as well joining us this year. So uh much appreciated. This uh this event has been going on for uh a while now besides uh in general started in 2009 in Las Vegas for those of you who don't know. Um it's a a
framework that's been expanded to uh help local communities organize grassroots events that are uh hopefully uh radically accessible and uh inclusive. So uh we try to keep uh keep the price low, keep it to uh one day, one track. We have a shared experience here and we we haven't changed much in uh in five years in our Oslo uh event and that is uh on purpose and in keeping with the feedback we've gotten from this great community. there's a there's a tremendous uh amount of uh of talent and uh and focus and a lot of great events in infosac and adjacent fields in Norway. Um, but we're we're uh responding to uh the feedback that we've
gotten to fill this this niche, this small niche for a relatively small group to say uh um single track shared experience uh and uh keeping it fairly fairly technical. So that's that's the gist. We've changed a few things uh in five years. So, I just want to want to say uh thank you to Yens from Elva Bakin for the new logo. Uh I hope you ordered a shirt. If you didn't, you're probably regretting it now. Uh there might be a few for sale uh left over after lunch. Also, the first time we printed the logo in color because we didn't have color before. Uh we have a volunteer forum now. So, uh to to further uh connect
with the community and figure out how the event should evolve over time, uh we we've uh formally put out a form on the website to uh ask for volunteers not just on the day of the event today, but uh to help us uh year round when uh most of the work is being done on everything from from media to uh to sponsor uh management and uh CFP call for presenters. and so on. So if you would like to be more involved in uh in this event from a uh planning uh execution perspective uh year round then uh we would love to uh talk to you about that. So check out the form. Um to kick us off today we have uh a great
uh speaker. I've had the privilege to uh hear him speak before and I thought this would be uh uh great uh great to kick us off in uncertain times to show uh how what many many people view as uh technical issues let's say uh can impact uh global supply chains uh geopolitics and national security. So with military police experience uh and uh security philosopher please welcome uh Seaman Ba
>> right is it 45 minutes okay >> okay thank Thank you for the introduction and the invitation to speak here today. Uh it's a great honor to be here. I know there is a lot of technical uh people in this room. Uh this will be a strategic briefs. Uh so so no technical indicators. Um but um yeah this this brief is about uh the Chinese ecosystem for offensive cyber operations. And what do I mean with the Chinese ecosystem uh for offensive cyber oper operations? That is how uh China uses um the civilian hackers and the IT industry inside their country to conduct operations behalf of uh the government and the intelligence services. So this is a topic that
interests me very much and there is also uh a few few persons globally with which uh does a lot of research on this topic which I'm building my presentation on. So thanks to those who deliver great uh insights on knowledge on this topic, I can do this presentation here in Norway as well. Um for those of you that doesn't know me, my name is Simon Baka. I work with the police um IT services in Norway. Um I do mainly strategic threat intelligence but also work some with cloud services and uh with technical guys in our department as well. Um so I'm not the technical guy but I have the strategic picture. So now you know where
to place me. And uh as you have probably noticed this presentation will be done in Norwish. So both the English speaking guys and the Norwegian can uh follow through the presentation. Okay. So what was the background for uh my interest in this topic? I was um invited to write a chapter in this book here called cyber power which uh was um given out last year on uh Okay. Um there was uh from the from the defense industry and the the defense high school here in Norway. There was uh a lot of contributors from that uh that sector. But also I wrote a chapter where I uh looked into the operation the exchange hack. Many of you probably
remember where uh the operation where uh the parliament in Norway was breached. uh through zero days in Microsoft Exchange server and also I looked into the operation uh on the supply chain attack against Solar Winds Orion platform. I'm not going to speak about the last one in this presentation. I'm going to speak about the the exchange hack where where the parliament in Norway was hacked by two APS connected to the China's intelligence services and those are named uh Hafnne and AP31 and that got me interested in how these different groups in China cooperates with each other and also the uh Chinese intelligence services because there was not only one group exploiting exploiting these zero day
vulnerabilities. There was uh at least four Chinese APSDs. I've seen some some researchers also uh going up to almost 10 uh different Chinese groups exploiting the same vulnerabilities in that same time uh frame. And also there was not only the parliament in Norway that was breached. It was about 10,000 Exchange servers uh worldwide, some of them in Norway, a lot of them in the US, but also around the the whole globe. And last year we also um got a lot of information from the US that American critical infrastructure and telecommunication was compromised uh by two different Chinese a typhoon after Microsoft naming convention. And in the critical infrastructure part where vault typhoon was behind they
compromised water supplies um energy facilities transport and logistics which doesn't have a uh sensitive information um they don't store uh sensitive information inside those system. They are critical infrastructures and the only reason you want to preposition yourself into that kind of systems is to conduct sabotage in a um in a time when you really need that opportunity and capability. So the first attack was mainly driven by uh espionage and want to gather sensitive information. But the second operation here was about prepositioning to get the opportunity to conduct sabotage which annoyed the American security services quite a lot. But the salt typhoon hack um that was uh by compromising uh telecom infrastructure in the US. they have been
inside at least nine different um providers of telecommunication systems and also inside the systems uh um that uh delivers uh secret intelligence to the FBI. So when the FBI surveills um Chinese spies inside America to conduct counter espionage because of course the American government wants to know who which spies do we have in their country. When the Chinese hackers are gaining access to those same systems, they can uh identify which spies from China are under surveillance by American authorities and which are not. And that that makes um they can then make the decision if they want to uh get some of the spies out because there are under American surveillance or they want to replace
them with uh new personnel for example. So this is a very strategic um good operation by the by China and that of course annoys the security industry and government in in uh in the US. So this briefing are uh are done in two parts. The first part is how does China's APS advanced persistent threat groups continuously get information about new zero day vulnerabilities to exploit and the second part is who in China is conducting the operations and how are they connected to the intelligence services and I'm going to use the operation against the stoing the parliament in Norway as an example on how uh the part two works and how they are connected to the intelligence and
services.
Okay. So this is a chart made by Arjeno Benasa a researcher at ETH sirich the university in in Switzerland where he have tried to draw a a chart on how this uh ecosystem works and it's quite good actually. So I'm going to um do an easy explanation on this one. First you have the CTF teams, the hackers and which of course uh uh goes into hacker competitions. Some of these teams are established from big universities in China uh and they also team up with uh smaller or bigger uh IT companies in China. So these are the civilian hackers. They are um competing in for example Tanfrey Cup which you can see on the middle here
under the vulnerability pipeline and on Tanfu cup especially they are uh trying to hack western products western IT products with using their zero day vulnerabilities and of course the exploit code they uh write to um exploit these vulnerabilities and then this is the interesting interesting part with China. Then they are um forced to report the zero days and the exploit code to the government uh through this process in this Tanfuk. And this makes the hacker competitions inside China uh quite different from what we are used to in the western world where we report this to the um to the vendors which can make security updates and patches to their products. But here in China, they are forced by legislation
to report it uh directly to the government and then the government can decide do we want to keep this vulnerability for ourself or do we want to give it back to the hacker so he can report it to the vendor and and the vendor can patch the product. Uh so a lot of uh zero day vulnerabilities are still um reported from Chinese citizens to the vendors but they first have to go through the government and if the government want to serve the vulnerability they are uh prohibited uh to report it. So so they're not not allowed if they really want if they see that this is a good vulnerability that we want to share for later exploitation.
So this is the part one and in the part two you have the ministry of state security which is the uh foreign uh intelligence service which conduct operations through the whole world and also here in Norway [sighs] they are distributing these zero day vulnerabilities to regional bureaus because they have a decentralized structure on the intelligence service in in China and These bureau are uh contracting uh with with uh private IT companies in China to conduct operations on on their behalf. And sometimes they also have um made front companies, companies that only are doing hacking on behalf on the intelligence services which are um made only for that uh that reason. Yeah. So this is the part two of the
presentation. So now we are going to dig into the details here. So how does China's APS continuously get information on new zero day vulnerabilities? And of course, as all of you, I guess know, zero day vulnerabilities are probably the most not always the most useful, but sometimes you need them to exploit the hardened systems which are really secure. For example, we have a lot of operations in Norway that is conducted against businesses that's used like a regular bad password or a already known exploit. But this is uh leveraging a little bit higher up in the threat scale because they have this opportunity as well. So uh in 2013 u the president in China Ping came to power and he stated that uh
China is going to be a um a big power in the cyber domain which they have already uh gotten in in just over 10 years. So uh the Chinese teams in 2014 were competing in pawn to own this uh white hatacker contest in Canada. And in in 2014, Keen Team China was the only uh team um participating in this competition and they they exploited several zero days in Apple Safari and Adobe Flash. In 2015, Hu 360, which is a big IT company in China, a compromised internet security, Internet Explorer 11, and Google Chrome browsers. And the same team um wins the title Lords of Pone and wins 520 American dollars, thousand of course, American dollars in 2016.
In 2017, Tencent, the big IT company or big technology company in China, uh wins $350,000 by exploiting zero days in VMware, Microsoft, Adobe products, etc. So here you can see the chart uh on the first year team China now was participating there they were only the the only Chinese team. The second year in 2015 there was two Chinese team. 2016 there were four and in 2017 there were seven Chinese teams and they did very well. Uh the first year the one team took 13% of the total amount of of uh payment uh in American dollars. The last year the seven Chinese teams took uh seven 79%. teams from the rest of the world took 21. So this just underlined that the
Chinese hackers were doing very great in this competition. But from 2018 there were no Chinese uh teams participating in Pontoon anymore here in in um Canada. And there is a reason for that because there was uh this speech on the fifth China internet security conference where the uh CEO in the big IT firm in China Hi 360 his name was Su Hongi and he told that vulnerability is a weapons of cyber warfare and should be considered as a national strategic resource. So, China should keep these vulnerabilities, not hand them over to producers and other governments and then uh you get this legislation that forbids the Chinese hackers to compete abroad. There is is not a
absolute um uh prohibition or something. Um so they have to to apply to the government to uh go outside of their own country to compete. So still the government has control over the hackers which annoys the hackers as well because they were making a lot of money in this international contest before. Now they don't have the same opportunity and the Chinese uh hacking competitions are not paying as good as the uh the private vendors which uh has their bug bounties program for example. So the Chinese hackers now have to stay in China to compete. And this is a picture from Tanfu Cup which is uh one of the big big u hacking competitions. It's not the only one. You have for
example Matrix Cup and you also have 161 um annually recurring events like CTFs um inside China but China is a big country so it's nothing unnormal about that which is the normal part is that you also have the uh reporting of the vulnerabilities to the government. So this competition um specifically Tian Cup, Alibaba, Tencent and BU the big technology vendors in China are behind this event and also after 2021 the military, government, intelligence services and the police are more present on this event. So they can of course look what's happening there but also to recruit uh some of the nation's greatest hackers. And there are typically western products like Apple iPhone, Google Chrome, Microsoft Edge, Office, Windows. Uh when
I got this they they had uh Windows 10 on the on the list. I guess they have 11 now. and also exchange service which were um hacked during the the exchange hack on the parliament in Norway and VMware products which are uh as most of you know [snorts] very important to big enterprises as well. So the goal is here to to gather these zero day vulnerabilities and also recruit some of the nation's best hackers. And a few years later uh we got this um leg legislation from July 2021 which is named regulation on the management of security vulnerabilities in network products. And here um it states that vulnerabilities must be reported to the government. MIIT is the
the the government institution. uh maximum 48 hours after it identified. And if you do not oblige to this, you you you can get um uh punished, which we have a few examples uh of. And there's also a pro prohibition on reporting these vulnerabilities to entities outside the Chinese government. For example, western hardware and software manufacturers unless they are allowed to do it. And that is a selection process inside uh the Chinese government to decide that. And we can also see the effect by this uh this law in the national vulnerability database of China which is named CNVD. And as you can see on the chart here from 2018 to 2021 on the official database uh the amount
of zero days reported is increasing every year. But in 2022, you can see on the chart that it's going down and the the the researchers which have made this report are arguing that the vulnerabilities that is um sent directly to the ministry of state security which is the foreign intelligence service in China does have their own not official database and the vulnerabilities um reported into the government and they want to store for further exploitation is in the undisclosed database the secret one. So that is one of the reasons for this drop in uh the amount of zero days reported which we normally would uh would see as going upwards and over 2,000 vulnerabilities had been
reported directly to the Ministry of State Security and over 100 of 41 of these was identified as critical severity which has a high CVS score. they're easier easier to exploit and also can have bigger consequences. And we can also see this on uh particularly the the database uh when it comes to industrial control systems and vulnerabilities connected to operational technology which as I started this presentation with Chinese hackers have compromised critical infrastructure in the US to conduct sabotage on a strategic uh time. For example, if the conflict over Taiwan's escalates and the the US are saying we are going to take uh Taiwan in defense and China is using the this capability to to leverage
pressure on the US then it will be interesting to see how will the US respond. Will they have enough with their own internal problems or will they respond and help uh Taiwan out. So that is one of this is of course nothing we know exactly because the Chinese are not going officially out and saying hey we are doing this to when you invade Taiwan we will sabotage your critical infrastructure. But this is uh the most plausible theory we have today. And as you can see, you had about 10 200 um IC uh vulnerabilities reported each uh quarter. But from 2021 when this legislation went into effect, we can see there was very few. And the reason for
that is probably that they see this is a strategic resource they really want to keep within uh their own hands. if they want to do sabotage on a on a later time. Yeah. So those uh vulnerabilities are reported directly into the ministry of straight state security. So some key events to summarize this up. You have chi the Chinese president in 2013 coming to power and says states that China is going to be a a superpower in the cyber domain. You have the Chinese teams winning pound to own. Then you have the speech from Su Hongi saying that this uh vulnerabilities is a national strategic resource. The hackers are prohibited to compete abroad. Tanfu cup is launched. The law vulnerability
reporting um is in takes effect and we can see that the vulnerabilities in the official database are falling. And then you have these are just a few examples. We can of course do a lot of other examples but you have for example the non-region parliament is getting hacked by two Chinese actors and also critical infrastructure in the US um last year officially they went out saying that they were compromised but the operations had begun uh far earlier about 2122 so these have been going on for quite a while so This in my my mind it it makes a picture of how China are rigging their society to leverage all their resources. Not necessarily all but all they all of
the resources they want to use in cyber operations. And now there are this uh approach are taking effect which uh makes the the the Chinese actors very capable and that is probably something we're going to see a lot more from uh the years to come. Yes. So over to part two. who in China is conducting the operations and how are they connected to the intelligence services. These are the second part. So this is a chart from the book I was writing on this topic uh where I tried to did a lot of open source investigation to make this timeline. I've not interviewed the parliament. they might have different opinions on what happened but uh from public sources
this was uh the the the story that was available and there was a lot of reason that I want to do this by public sources because it's not everything they want necessarily to talk about so it's easier if you can do write something like this in public sources then you don't have to take in take in mind that some information can be classified etc. But in 2021, the government of Norway attributed the attacks uh in public to China and the police security services uh went out in 2024 and said that it's likely that the first attack on uh the parliament was conducted by Hafnne that was conducted on the 17th of February 2021 two weeks before Microsoft went out and
said hey there is a critical vulnerability in the change servers you should patch immediately. So the threat the Chinese threat actors were already inside uh the parliament systems [snorts] and just to note that Huffnim also uh gathered 4,000 emails from Mikall Tetnner which was the um the second in command in the foreign committee on the parliament. So he probably had a lot of interesting intelligence information in his uh in his outlook uh box as well. And the police security services said that it's likely that the second attack was conducted by APT31 and that happened um 5th of March 2021. So 3 days after uh Microsoft launched their security update and their patch and also 2 days after uh the parliament
themselves said that their exchange servers was updated and al as probably many of you uh know and understand if the threat actors gain access to the server and install webshells they will have the uh availability to go back for the servers even though you have patched. So it doesn't necessarily help the patch. Uh and I have not got this confirmed by the parliament themselves but there was 10,000 exchange servers back door by these Chinese actors which made the threat actor a able to go back for these servers if they wanted to. So when halfneum first get in probably backdooring the server with webshells then apt31 can go inside later. So there is some kind of
information sharing between these groups. So both of them and also more of them can uh leverage the same zero day vulnerabilities. [snorts] So [clears throat] from 2024 and in 2025, the American Justice Department has uh gone out with indictment and charges against some of the hackers behind these two operations. And from the American official documents, we can read that in 2024, uh, AP31 is linked to a front company named Wuhan XR set. Uh, also named by Microsoft convention Violet Typhoon uh, as one of the actor actors one of the actors connected to AP31. And also in 2025, Hafnne is linked by the US government to AP27 which is now named Silk Typhoon after Microsoft naming convention. Haftim was
the old uh old name. And soon uh this private IT company in the US, many of you probably remember from the ION leaks for about one and a half years ago. They sold data stolen from AP27 for between$10,000 and $75,000 per email account. So when Huffnim first is uh hacking the email servers of the parliament exfiltrating 4,000 email to Mikall Tetner, this data could possibly be be sold inside China to different actors for this uh amount of money as you can see here. I'm not saying that we know that this has happened, but there might be a a possibility. And when we look at the charges from uh the the official um uh US institutions, we can see that Ying
Keshen and Su Shuai is two of the uh charged hacker behind the operation from the ministry of uh state security, the intelligence services side. and Sushuai Coldface is one of the the elite hackers from the the be beginning of the 2000 era and how he now is one of like the the big big guys conducting this operation in China, but he was a civilian hackers way back uh in time before the government was very interested in conducting offensive cyber operations. So those who have been civilian hackers or some of them later on becomes recruited by the government and now are in positions to for example conduct this operation on behalf of them and they uh did operations against us
and Europe from 2013 till uh 2024. The parliament in Norway was compromised by this group in February 2021. They exfiltrated data and sold it to customers inside China. And also the data was sold by the hack for hire company. And [clears throat] then this there was an interesting uh thing happening in this summer because from the the same uh group there was a Chinese citizen arrested on the 3rd July 2025 on the airport in Italy in Milano. Um he was one of the guys uh connected to to the Hafnneim group Sue Sevi. So when the US are um doing the investigation they can um make an arrest order for this guy. So when he travels to a a
country abroad which has uh agreements with the US of extraditing people to get uh investigated and they have to go there to to get in court and the whole judiciary process are ongoing. Then this guy lands at the airport in Milano which is inside the Shenhen area which is a part of Europe. uh when he scans his passport there will go an alarm inside the system which says that this person has to be arrested for later on to get charged uh from the American government. So this is some part of the international cooperation and this actually underscores that investigations can be effectful and not only incidents response as many of you I guess are working with but also
when one of your companies are breached by contacting the police and making a real investigation by the police. This can be the outcome in probably one of the best possible ways. It's I don't say it's usual to do arrest by by cyber investigations but we have a few cases for example also the hedro case with the ransomware against the Norwegian company Hedro. Uh there have al also been a lot of arrests. So in some some cases we have this uh this uh you can call it showcases but he was uh charged for comput computer intrusions and theft from February 2020 to June 2021. He hacked American University after COVID research during the start of the pandemic. Of
course that is a very interesting intelligence um objective. So when these guys are conducting operations, it's not for only for fun. It's because the government has intelligence requirements. They want sensitive information and that is often very specific specific information they want and then they are using their operations against the target that has this particular information. Yeah. and he conducted operations on behalf of the Shanghai state security department.
So let's have a look on the hack for hire company uh which which uh eight ISON employees and two officer from the ministry of public security which is the um internal security department in China which was uh charged. So the the primary clients for ISON is uh the intelligence services MPS and MSS and ISON worked with at least 43 MPS and MSS bureaus in China. The prizes, as I mentioned before, was behind between $10,000 and $75,000 per email account exfiltrated. And they also sell uh software to conduct offensive operations and hacking as you can see on the left side here. So if you want to make it easier for um people inside a company or a government
agency to conduct hacking, you will probably make a lot of fully automated or not fully but as uh automated software that you can use to conduct operations easier and make more persons available to to conduct offensive operations. Iun the company um their main office is in Shangdu in the Sichuan pro province in China. So if you in your networks uh see an IP coming from the Sichuan province uh from the city of Chingdu this might be the company trying to access your internal servers. Uh but there is a lot of other um companies in Changdu IT companies in Changdu as well. So see this is not the only one. This is like a technology big city in China. And here you can see
an an entrance from their main office. So the last company um from the the US Department of Justice which is charged. You can see that these seven persons are connected to AP31. They operate from the Hubet State Security Department in the Hub Bay province in in China. And this company was created in 20 uh 10 and named Wuhan XRZ. So that is the front company and from 2020 2010 to 2024 they conducted offensive operations and in March 2021 the parliament was compromised by AP31 and AP31 has uh conducted operations towards the fulkis man the starts for vault in Norway uh hes and siku's partner the health instit institution and uh for example Visma the private IT
company here in Norway. So parliament in Norway are not not the only group uh breached by this uh AP31 activity cluster and that's important to say. I'm not saying that these seven guys conducted the operation towards storing it but these seven guys are part of the activity cluster that we are um calling AP31. There is probably a lot of more people connected to this AP31 activity cluster, but these are also a few of them that we know of. And these hackers, they are named because the US intelligence services and law enforcement has so good investigation capabilities. So they are actually able to um attribute the operation down to to to specific persons and that is a
capability that is very rare um and often very high classified. So when they go out with this information that's very good for us because we get to know a little bit more on what's ongoing and for example the hack towards both stoing and uh the the ministries in 30 20 33 the summer uh against the the ministries in Norway. I guess you remember the Ivanti vulnerabilities that was uh exploited. They have not been uh charged down to individual persons behind the attack. It has not even been attributed back to a a country when it comes to the hacking for the ministries. So Norway I I guess maybe also we need some capabilities to be able sometime to go
after these hackers on an individual level because the reason the American are going out on this uh in this way is that they are thinking that this will prevent them from wanting to hack American companies and American governments uh in the future. because now they know that if they are coming to a western country which has uh agreements with the US, they will get arrested and they will get extradited back to the US and charged for their hacking. And the US still think this is a a preventive strategy. Uh but these are the hackers that doesn't want usually to to get reported. They want to stay under the radar. they want to get stay silent. They don't want
to get attention. So this is a lot of different uh between this there's a big difference between these guys and for example the activists that are trying to ded our our web servers because the all they want is attention. So we cannot uh make a a similar um way of handling these two groups like the the hackers that wants to go under the radar and the the activists or activists and they also targeted uh over 400 IPAC members and IPAC stands for interparliamentary alliance on China. These are for example uh politicians that are hard on China when it comes to human rights. Uh so this is just a way to leverage their capability to like threaten um the IPAC members
that are uh against uh China's uh human rights abuses. Vuhan XRZ you can see uh here in the Hubai province. I guess most of you remember the city of Huan from the corona ep epidemic and it operates from hub state security department which you can see uh a picture from here. Uh so if anyone of you are in China sometime you can probably uh have a look on this big beautiful building. Um yeah so to sum up the Chinese ecosystem for offensive cyber operations uh from my talk is looking almost like this. You have the civilian hackers on the left side for example Keen team China and either smaller or bigger IT companies working in China.
You have the uh hacking competitions as Tanfu cup which they are exploiting a lot of western products. Also they are uh exploiting Chinese products to make the security better. For example, internet connected vehicles which is a big thing going on in Norway with the debates about for example these buses and cars which actually Norway is one of the first countries worldwide to really adopting this uh Chinese technology in big scale. there is not a lot of uh electric vehicles around other parts of Europe. We are one of the first one to going through this debate. So it will be actually very fun to see how we can handle this uh or if they are backed all over or
they are just vulnerable I guess but they want to actually patch their own products as well but they want this capability to hack our western systems, western vendors, western products through the the reporting of the vulnerabilities to the government. And then you have these uh ministry of state security representatives behind the operations which are conducting the operations from the reg regional offices and get the front companies or companies that is contracted to uh conduct the operations on behalf of the intelligence services. And that is also one of the reason that is hard to attribute the operation back to the government because there is a distance from those who are hacking and those who are connected to the
government. And unless you can go into the communication and see that this group are uh orchestrated by the government. It's very hard to tell exactly that the government is behind the operation even though they might be. Yes. So that is how I sum up the system and I guess there is a lot of other like uh ways of of telling this story. Uh but I've just chosen a few examples on how this uh system works in the big picture. So when we are discussing how China is leveraging their capabilities to hack western uh companies and governments, have in mind that they have they have this big system supporting and underline this underlying this. And this is very
frustrating for the Americans which are now uh seeing that the the Chinese are really going uh way forward with their capacity and even out uh numbering the the American uh intelligence services. They have a lot of more resources that the Americans than the Americans have. Yes. So this is some of the sources I've used to make this uh presentation. If if any of you would like to read more, you can have a look. So thank you. Hope you know know a little bit more about the Chinese ecosystem. [applause]
>> Thank you. Simon, um are you around for the rest of the day or some of the day if folks want to ask you some questions? Yeah, I'll be here to >> lunch. >> Great. Yeah, you heard that. You got until lunch. >> Yeah, thanks. >> Thank you so much. Here's uh >> something from us. >> A donation to mendom uh >> uh on behalf of all our speakers today. >> Thanks again, Simon. >> Thanks. >> Yeah. And up next, we are going from global picture down to inside this man's laptop. uh former forensic investigator from the Dutch police, now principal security research at Elastic Security Lab all the way from the Netherlands. Give a warm welcome to
Remco Sputin. [applause] Thank you. >> Should be good, right? >> Yeah. >> Any sound? Wait, there we go.
Oh, yeah. I got it. Sorry. >> So, we should be live. Yeah. Give me just one second to connect everything
and we should be good to go.
[snorts]
Nowhere.
There we go. That's better.
Way too many windows. That's better. But sorry for the start of delay. Um, welcome everyone. Thanks for uh for having me. Um, as I said, uh, my name is Franco. I work for Elastic uh within the Elastic Security Labs team. Um, whoa, the screen is big. Uh [laughter] um within Elastic Security Labs, I mainly do two things. I reverse engineer malware focusing particularly on Linux malware. Uh and uh I do applied uh machine learning. Applied is a really big keyword there. Uh because I am not a machine learning expert. I'm not an AI expert, but I am a big fan and user of everything um AI. Um and specifically Linux in the thing. Now, Linux is in my opinion a very um
a very underresarched uh platform in terms of cyber security. We have a lot of technology going into uh the Microsoft space a little bit into uh Mac OS uh but Linux although we have billions of devices running on uh on Linux it's it's heavily underresarched. Now the idea for the talk that I'm giving today is actually a result of something that happened to me uh a couple of months ago and that is what you see here on the screen uh a paper that me and a colleague published a couple of weeks ago at Virus Bulletin in uh in Berlin. Um and how did we come up with that uh um that research? Well, about let's say eight
maybe nine months ago, I was sitting down with my colleague uh he also focuses a lot on on Linux from a threat detection engineering side and uh we decided the world needs to be more uh aware of Linux vulnerabilities and Linux uh malware. So, we decided to do some research into the current state of Linux root cakes specifically. Um well, as one does uh especially if you're a little bit overachiever, you start making a plan. So you get uh the top 10 uh most commonly used root kits that you want to research. Then you slip down a rabbit hole and you go into the top 20 because this sample is interesting and we don't need to forget that sample. And then you
end up of course and with the top 30 35 samples and of course every uh family has its own variants and uh uh stuff that you need to look at. So before you knew it, I was looking at a pile of like 200 to 300 uh uh Linux roots that I wanted to investigate. Also, as one uh usually does I guess is two weeks into the research, we decided this is going to be really interesting. Let's write an abstract and send it over to something like fire bulletin will never get accepted, but uh at least we'll uh we might have a platform to uh to publish this. Then we forget about the research again as one does. Um, and
a month later you get this amazing email that you got accepted and you need to send in your paper within two months of the acceptance uh uh mail. Now um all all common sense of course uh we all recognize this uh but uh I had a slight problem. I had a huge stockpile of like 300 samples that I needed to reverse engineer. every sample that I want to uh look at will take me roughly between one and four hours uh to uh fully reverse engineer and answer the questions that I uh need to. So I don't know 1,200 hours is just a little bit much. No, at the same time, um, people in the, uh,
reverse engineering community were looking into using machine learning models, LLMs to be specific, uh, for reverse engineering. They're using a technique called MCP. I'll I'll get into all these terms. So, don't worry. Um, and, um, I was interested by that. I thought, well, maybe one of those LLMs can can help me get this stockpile uh, uh, down a little bit. Um so I started that I've set up uh everything and started working with it. And what I want to do today is take you on the journey that I took over the last couple of months going from um just looking at my disassembler and reverse engineering uh samplers all the way up to uh using an
automated pipeline using different machine learning models to uh do the analysis. Um and I only have 45 minutes. I'll try to like rush this as soon as as fast as I can. Um, I'll probably skip over a few things. I only have a few slides. Uh, and most of the uh the rest of the talk are are real technical demos. Um, and I hope that by the end of today um you will be inspired to at least try to set something up yourself uh if you're in this uh in this space. Um so yeah now um I quickly mentioned the first technology and that is MCP. We as I said I only have a few slides a few
terms that I want to explain before we uh dive into the the demos. Now what is MCP? That's the model context protocol. Uh it is designed to give LLMs for anyone who's not familiar. LLMs are large language models like CHP, Gemini, Grock, the the the ones that are all in the news right now. Um so um MCP is a protocol that is designed to give LLMs access to more than just their own knowledge. It's a way to talk to tools and uh uh functions source code. And on the left side uh I have a extremely useless but small um uh demo of how you could build a an MCP tool. Um fast MCP is the most commonly used library for
this and Python the most commonly used language. Uh but what this actually does is it defines a uh an MCP tool called add which gives an aland the ability to add two numbers. As I said it's completely useless but it demonstrates uh uh how you would build something like that. Um if you want to start uh building stuff uh the documentation for uh fast MCP is amazing. Just take a look at it on the internet. Um now one thing about MCP is um is the cost. Um later down the talk we'll we'll dive into the costs of uh uh running these uh these systems. But you have to imagine um every word that you send to an LLM is
roughly a token and you pay for that. Not a lot. We're talking pennies, but if you add them up, it becomes uh uh a lot. Now, if you're chatting with JGPT in the background, uh API requests are being made to uh to the server. And what you see on the top, thank God for a big screen, you can actually read this. Um uh you see how a uh normal LLM uh uh query or API call is constructed. You would have a system query or or a system prompt that tells the llm you are in my case a reverse engineer. this is your task uh some detailed instructions and then you have the the human part um
again in my case please explain this code or whatever it's just a short example now if you were to give a llm access to uh MCP based tools you would need to tell the llm hey you've got access to these tools so now the API request for sending a single uh uh tool into the request looks like this you have a uh basically the same uh uh uh API But you add information about the function. So um what kind of information is that? You have a description um the name any uh uh parameters that you need to pass to the function. Um and this is just a very short example but you can uh actually
give the LLM a lot of information on how to use these functions and how to use this uh context. Um now as you can see the request that you're making is uh the size of it dramatically increased and this is just by adding one tool. If you add want to add like 50 tools that means that just the amount of tokens that you that you are using uh will go up by by hundreds of percent. Um and in the couple of slides we're going to see why that might be an issue. Um now this talk is about reverse engineering please. Yeah thank you. So uh let's see how we set this up in a reverse
engineering context. Um what I'm going to do is uh uh I'm going to set up a llm client. Um most of you probably typed into uh the the web interface for JPT Gemini croc maybe. Uh that web interface is considered a LLM client. Um, if you want to do more advanced stuff with LLMs, uh, you need a different kind. Either you code it up yourself, which is overly complex, or you use one of the free tools, um, that I'm going to use today. Um, in the slides, I have links to, uh, different, uh, uh, tools. You can make a picture or I can distribute slides after the talk. Um, but uh uh in this example, we have a LLM
client that'll be using the MCP protocol to talk to what I call an MCP bridge. That MCP bridge is a small Python script that's going to supply the LLM with all the tools that it needs to do my job. Reverse engineer a a binary. Now, for my job, I use a disassembler. Um we have multiple of those. Um my preferred one is binary ninja. Um but some people prefer uh IDA Pro even Gedra. Can't imagine who but someone probably prefers uh prefers Gedra. Um well why Binary and Ninja? Um I'm a huge fan. I've been using it for for years but Binary Ninja has in my personal opinion the best programming API to run stufflessly to write plugins. And so um
uh the MCP plugin was an easy development for for most people and I can I know the API quite well so it's quite easy for me to make uh make adjustments. Um just trying to keep track of my notes here. Um yeah as I still want to be uh be fair uh these are the uh the links for uh different MCP uh projects. Um, I'll be using the the top one here, but if you are a paying user of of IDA or um a fan of Gedra, there are multiple MCP implementations uh that you can use. If you don't manage to uh snap a screenshot of this, just Google MCP for your favorite uh uh server and you'll you'll
find one. Um also just slightly mentioned uh LLM clients. Now um the alam client that I'll be using is row code which is a simple plugin that you can install in visual studio code. It's free. You can just uh u run it configure it and uh be done with it. I have used uh client. I've used cursor AI. I've used GitHub copilot. You can configure that somehow. Cloud desktop and there are many many more. Uh I just haven't used them. So, we can't uh really uh recommend checking uh any of them out. Um but these are the tools that I'll uh I'll be using. Now, this is the uh uh last uh uh B. This was the last uh uh
boring part. Um if you actually want to communicate with the uh Alam APIs, as I said, you need a client. You also need a um Bing subscription. Uh so uh you can't choose your your GPT uh uh um license that you uh that you normally have. You need to go through an API and a lot of people for that use something like AWS Batrock or Open Router as a payment proxy uh for that. I'll uh I'll go into that in a bit. Now, this is my cue to uh switch over to the first demo and um let's see. Would be nice if it would actually switch to the right desktop. Here we go. I know it's hardly readable. Um there's
way too much on the screen. Uh but it's really not important for uh for this demo uh to actually read uh uh read everything. Um in this first demo, uh let's first take a step back. Uh we uh we're still in the journey where I've got 300 mware samples that I need to investigate. And my normal process is okay, I've got this uh mole sample on the left. You have a binary ninja open. Um and this mole sample is a Linux rootkit. I actually know that because I wrote it. Um would make the process a little bit easier, but uh stupid me. I already I also offiscated just a little bit. Um one slight note and this comes back to
why that did that we did the original Linux uh uh research. Again, Linux is very under research. That means that Linux malware also uh employs or implements very simple tricks to offiscate itself. Just a few spring offiscations and you're you have uh uh you've bypassed any security protections that you have on Linux. Um but normally I would uh uh start a binary ninja and then first I need to figure out okay what's the um the starting point of this uh um malware and then I need to go over this uh all this crap to figure out what does this function do and what does the following function do. Um for our research I was uh particularly
interested in uh what um hooking techniques were being used in the kernel. So um what kind of technique and what u sys calls were actually being hooked. Now to figure that out you need to find out where is the hooking function uh uh being placed. Um how are they doing it? Um are they using after key K probes? Maybe EBPF who knows. Uh but in order to do that you need to clean up this binary. Um and that takes some time. Now on the right side you have uh Visual Studio Code. Um the the chatter that you see here is row code and I hooked this up to Binary Ninja. I'll just quickly show you the
configuration for the MCP server. Um, basically if you want to u configure it, it's really easy. Uh, all the MCP servers have instructions in there with me to uh to set it up, but you just tell it where the MCP server is located, how to run it, in my case with Python. um and you um allow all the functions that you want the LLM to be able to to use. Really easy but not that important to understand for today. Now the tools that uh you provide to the um LLM, I've got them listed here are basically all the functions that I would uh use in my day-to-day job. I want to rename variables. I want to rename functions. I
want to define types, classes, sometimes take a take a hex dump. And if you look really closely, let's see if we can zoom in on that. Here you see the information that you uh put into the um LLM or in the MCP description. And this information is get get sent to the LLM when you are making a request. Um let's see this. Here we go. Um, now let's assume you've got everything set up uh quickly and um we want to start uh uh uh reversing. Now, because I'm horrible at typing when people are looking, I'll just copy and paste this and we'll this is all I'm going to say just reverse engineer this binary. Stuff is going to happen now.
Apparently I didn't allow
ah this is good stuff. Uh the demo is failing horribly. Just quickly
So,
well, this is going to be fun. Okay, we'll uh just uh uh I'll talk you through it. Um, ignore everything on the left. What normally is uh is going to happen is that um I'm not telling the uh the the LLM what it's supposed to do. Um if you look at it from a human side, I went to the LLM, slapped a sample into its face and said, "Rverse this. Do it. I'm not giving it any information. I'm not uh uh telling it what I want as a uh as a result." Um, so the LLM starts making its own decisions. Um, let's see. Um, one of the things that an LLM is inherently very very very bad at is
making decisions. Um, I've been testing with uh with this technique for uh quite some time and I figured out that this uh what we have here is a standard MCP setup and I enabled all functions uh 54 uh out of the top of my head. Now I've realized if you go over 20 different options that the LLM can choose from, it's going to do stupid stuff. Um it's just going to decide, oh, let's try this tool. and the output just doesn't do anything uh uh special. Uh let's try this tool. Let's try um getting the the the C code. Oh no, let's get the disassembly. Let's get the Rust code. Let's get the high level intermediate
language, the medium level intermediate language just for the same function because it just doesn't know how to make the right decision. Now, normally that wouldn't uh um uh wouldn't be the biggest of issues, but I want to zoom in on just one part here. In row code, you can actually see how much every request is costing you. And you would say um not even a cent. Well, not that uh uh that important. Let's scroll down. Oh, we're going to almost ascent. Let's go from the from the top. This number for every request will only go up. Now, why is that? A um an LLM contrary to what you think if you use the uh the
web interface uh does not have a memory. It's not built to remember what you asked it before. It it doesn't know how to do that. So we simulate a memory. How do we do that? Every time we send a request to an LLM, we send not just the question that you asked, but the entire history uh with everything uh that it already knew, we send it again. So every uh agentic request going up and down makes request bigger and bigger and bigger. Now these are a couple of requests. We're talking a send per uh request, but I've been uh reversing binaries. If you if you're really into reverse engineering and ever try to reverse a Golang binary, uh that's going
to take hours. And at some point the regress are going up to like 10 cents, 20 cents, 50 cents, a dollar, $5. Yes, I made that mistake. went for a coffee, came back, and my boss had a $5,000 LLM bill. I had some explaining to do [laughter] now. Um, let's see. Um, somehow this is not going to work. But, um, what does matter is that, uh, you need to start giving the LLM more proper instructions. Uh, instead of it, uh, uh, wasting time figuring out what to do, you can give it instructions. Okay, you need to start by uh finding the entry point. Then follow the flow of the uh normal uh binary, disassemble every
function that you uh uh that you encounter. And if you um uh find out what the function is actually about, rename it. I'm giving you the uh the the possibility to actually rename a function. So in the end um uh what we would normally get is a binary that um has um all its functions renamed or maybe most of it and I have a binary that I as a human can actually start working with this worked. Um, and going back to the 300 samples, instead of spending four hours on uh on every sample, I cut it down to okay, I just need to look at every sample for maybe half an hour to an hour, which is a
really really big improvement, but still a lot of work and the deadline was uh uh creeping up on us. You need to like submit your paper, the research needs to be done, the you need to submit the slides a couple of months in advance. Um, so time was uh definitely running out. Um, let's see. Uh, yeah, before I go into the next part of the demo, I have another story from this whole journey to uh uh to tell. Gus, we messed something up really bad. Um, we got the pipeline working. I started engineering prompts and I actually got the LLM to a point where it was outputting nice JSON documents that I could then feed to my
uh uh Python scripts do nice uh uh statistics on it. Um my data engineering friends were really happy because it was generating nice uh nice graphs but then um I was using uh uh claude by entropic uh claude version 3.5 and claude uh decided to uh create an upgrade and claude 4.0 O was born and released. I'm a tech junkie, so I always want to uh want to use the the newest version. And I thought, "Yay, new model, probably going to be even better." So, I just flipped the switch, went to uh Claw 4.0. And um the first uh uh results were amazing. I uh opened up uh Binary Ninja and I saw that was doing great. It was
understanding functions faster, better. So thought, hey, this is working out. Now 50 samples down the line uh my Python tools for the analysis were breaking because the JSON that it was outputting wasn't the same as before. I thought oh crap something is wrong. Uh now an upgrade to a model can be a good thing but there's also one change that you need to uh uh keep in mind. And for that back to a slightly boring uh um chart let's explain how a model actually works. Now, if there's any AI ML engineer that that or anyone in the in the field, I'm I'm terribly sorry about for what I'm about to do. Uh but this is
just really high level to explain um something in an AI model. What you see here is a basic setup. Uh an AI model takes some input on the left, does calculations in uh in its neurons um or nodes and outputs a number. Input and output are always a number. Um so let's say we put in the number uh five and we want it to do something very simple multiply it by two. So we want it to always output the number 10. Uh now we set up a model. We choose all the calculations that it does or the the weights and numbers uh in the middle uh at random and we then feed it the number
uh five and we see if the number 10 comes out. Now normally it doesn't because we we chose everything at random. Um so what do we do then? Well, we try again. Generate all the uh uh uh weights and numbers at at random and see if the output works. That's not going to work. So we try it again and again and again and again and we keep trying and we keep trying until we finally have one model somewhere down the road where we by accident chose the uh the wrong the the correct numbers to actually uh give us the output that we that we want. Now imagine that this this setup is basically the brain of the of
your model. Um and I'm training two versions 3.5 and 4.0 in this case. uh training with the same data, maybe even bigger data, better data. Um but it's not the same model anymore. The the capabilities and intelligence that the model has has gone up, but um it's and I'm even afraid to use this term, but it's personality has changed. Now uh yeah models don't have a personality but I do um um it is a decision tree that you uh um uh go through see it as talking to a 12-year-old kid uh if a um if you have two very very smart kid young Sheldon if anyone ever see the big bang theory or young Sheldon uh you have
two exactly very very smart kids that know a lot of stuff and you're going to ask them to do their homework take out the trash whatever you're going to give them something. This one kid is going to respond really well to having just a direct question. The other one you need to explain, hey, I really want you to do this because blah blah blah blah. And alons are actually not that much different from children. Uh they are very very smart, but you need to find the the way to talk to them. So if you're in the process of building a pipeline, uh it's um it's important to uh make sure that you um understand how
that alm actually works and you don't change models in between uh in between the uh uh your development. Now um I still uh going back to the the the story, um I had my solution. Um I had a half decent solution, but I was still manually sifting through uh uh everything, which wasn't good. So I decided I need to like put this in a automated workflow. I really don't want to touch all this uh this stuff. I want the models to handle this for me. Um so what I did next is started using uh NADN. I'm probably pronouncing that wrong, but there's a lot of discussion how you actually pronounce it. I think it's naden. The company itself doesn't
tell me exactly how to pronounce it, so let's keep it at that. Um, but what I did uh um is I replaced my row code client and the stuff that I'm typing in into a naden workflow. Um, so I'll just quickly update uh update the chart. Um, nadan is my my new client. the rest is uh uh staying the same and instead of binary ninja as a guey I'm using it in headless mode meaning there is a terminal in the background that's uh uh doing the stuff um the workflow itself oh I totally forgot N8N if you don't know it's a tool where you can do uh low or zero code uh development where you can visually
connect every kind of service where you could say if I get an email from X I want to send a uh Slack message to this channel and save it into a word document and automate all this uh all this stuff. Um I used it primarily for just uh uh creating a quick proof of concept which can click everything together. Um and what you see here is the workflow that I used just to uh replace the um uh the guey that I was using. Um you have an input where I tell it okay you need to this file is the file that you want to start uh um reversing. That is uh my AI agent, my lead reverse engineer. And let
me just um show you how that looks like. This is my reverse engineer. And here you can see the prompt that I'm using. As I said in the beginning, it's not important to actually be able to to read that. But uh by the time in the project, I was giving these instructions instead of just reverse my binary. um which worked out uh better. Now if you start this workflow and let's hope that this works. Awesome. This actually works. Um on the top you can see the workflow uh running. It's it's doing its uh uh thing. Um but the reason that I'm showing you this is that in and then you actually have a quite an interesting
view of what it's actually doing. So here you can see uh the MCP client getting all the information from the from the binary and in between all the requests that are being sent uh uh to the um LM. Now pay attention to this number the amount of tokens that you're uh that you're sending. As I explained earlier um the amount of tokens never gets less it only gets bigger. So by the time we start reaching something useful, we're up to 5,000 tokens per request and that is because all the uh the decompiled code um is being added to the request. Now this is for me as a human completely unreadable but a machine can actually makes uh make sense of uh of
this. Um let me just I've got 10 more minutes, right? looking at the organizer. Oh, okay. Let me uh speed things up uh just a little bit. Um this is all good. This is automated. I can run this in the background. I can tell it here are 300 samples. Do your thing. Uh and uh output uh all the stuff which is good. But I'm still at that point where I'm giving you a sample, slapping you in the face saying reverse this. Be done with it. Now, if I need to reverse a sample, I use more than just uh uh just my disassembler. Um it's a good tool, but I um but I also use some uh the more
information I can get from a sample, the better. So what I did was I expanded the uh the workflow adding a what I call here a support proxy which is just a simple uh HTTP tool that collects data about the file that I want to analyze before I send it to the LLM. So what I do is I reach out to virus total. Hey do you have a file report? Do you have behavior reports? Put that in. Uh I use detected easy. That's a simple tool for identifying packers. I use kapa by mandient that gives you a lot of hints into uh what the file uh has for capabilities and I put that all into one
uh uh one request feed that uh to the model and of course that fails horribly. Why? I've got a uh pre-run um example here that should Yeah, there you go. Um this is my workflow for getting the virus total information. And if you look, don't mind the the details. If you ever worked with the virus total uh API, you can see that the virus total reports are huge. This is a lot of information even for an LLM to process. This is way too much information. So um if I would send this to my reverse engineering llam and say, "Hey, you've got this information. Be happy." That's not going to work because I'm I'm going to give it a
100,000 tokens just to get started. So, you need more LLMs in this workflow. Uh I've put in a uh a summary bot. This is a cheaper LLM. Um I think I'm using Chpt 3.5 3.0 So mini because all it needs to do is get all the information from uh the virus total reports and the the other uh tools that I'm using summarize it extract like the things that are relevant for the reverse engineer and then I'm sending that over to the reverse engineering workflow. Now in the end the reverse engineer did its job and what you uh get out of it is a very nice mulwark analysis report where you get like a description interesting functions that it uh uh that
it found and even some IOC's so some IP addresses and stuff and this is basically the output of a very very basic reverse engineering um report um but we're not done yet Um um myself, my team, everyone within the reverse engineering community, we have done years of work reverse engineering uh malware binaries and we've put all that information into our IDA databases, binga databases. Um we marked up a lot of uh code. So it is a shame if we not use all that knowledge. If I start investigating a sample that that I already or a family that I already investigated, I'm going to open my old uh database. See what is in there uh and
see if I can just copy and paste uh most of the work. So let's give that capability to the uh LLM as well. Um I've updated the flowchart uh uh again and what you see here is I don't think I have enough time to go into the demo but um uh I gave another tool to uh to the LLM a rack proxy. So rack stands for retrieval augmented generation if I'm correct. I actually work at a company that does this but I don't even know the um now what does that do? It's another uh uh service that uses is a uh small language model uh specifically designed to uh make an embedding from a piece of
code. Uh what is an embedding? It's basically a list of uh numbers. You can see here I've got the this is the the database where I'm putting everything. Uh you can basically here's a uh a little bit of code that I extracted from uh some malware and my uh machine learning model translate that into something a model can understand a vector uh as it's called but it's a a list of uh numbers. The cool thing is uh this model uh is trained to understand actually what source code does. So I can give it uh open source C code. I can give it a mold reverse engineer code and if it finds a a certain let's say an algorithm or a
function and I give the same function to it in a different language um it'll know that it's actually the same thing. Um so that results into something like this. And I know I'm rushing uh but this is a uh the engineering uh workflow where my reverse engineer doesn't only have my MCP client but it also has a code similarity uh tool that it can uh can use. Now the code similarity tool is yet another LRM. Um because if you do stuff like factor search and you search for similarities, you don't get back one result, you get back 50 results and you need to figure out okay what what is the actual uh answer to the question that I'm asking.
That's how factor databases work unfortunately. Now if you uh run this what you will get oh I actually ran it. So no, let's uh run it again and hope that it's in time a little bit. While it's running, what once the requests go to the code similarity, you will see that um the uh the main LLM is getting a function saying, "Okay, I want I want to know if this function is actually known." And here we see one of those requests. the uh the main LLM asks, hey, this is um something that is happening. We do a code similarity search and the second LLM figures out, okay, this is a um this is a function that has
u um no real meaning. It explains what the function actually does and that is information that is fed back into the main process. Let's see if it actually does one that's more interesting.
Here we go. So here it actually figured out that uh hey the function that you given me it's actually from a uh more sample that we already know. It's called warm cookie. And because we uh uh gave it a uh a version of the source code that is already marked up, the main model doesn't need to figure out, hey, what does this variable do? what does this uh uh data variable do and it just can just rename everything and use that uh uh information. Um now the thing is this keeps getting more and more and more expensive uh because we're feeding uh uh more data into it and I was thinking okay there the the thing more expensive is relative
because at the end of uh uh this analysis it would have cost $2. You would think, okay, $2, that's not that impressive. Um, we at Elastic, we have a molar analysis pipeline where we feed information from uh all kinds of uh uh resources. We uh um analyze about 1.5 million samples a week. $2 gets really expensive if you want to feed everything into that pipeline. So, that that's not really an option. Um uh we do trim it down to uh uh to not reverse everything that that that's not needed. But I needed a solution for uh uh for that. And this solution is me stepping back again and going okay I know I'm using an LLM.
I'm using I know it's automated but what am I as a human actually doing? Because LLMs are designed to mimic human behavior. Um now if I open a database I start looking okay high level what does this function do? Okay I've got the main functions and then I can actually close the database and forget for it forget it for a week. then I can go back and say okay I need some more detailed information let's dig into these structures dig into that function and that is exactly what in the next iteration of uh of this workflow that I'm actually doing is um I'm not giving the entire binary to the LLM in one pass I'm doing multiple passes so the first
pass get a high level overview then the second pass I'm not sending the entire chat history back I'm asking it open the database again and now start looking for a few structures so now the chats that are being sent to uh the LLM are small again and asking it to do a specific task. So uh I did some tests and within like four passes u you normally get up to 80% of a human reverse engineer binary result. Now the crossing graph um prompts are uh just a few lessons learned. prompts are u um model specific uh you need to train uh your pro engineer your prompts for the model that you want to use. Um even the
best LLMs uh the ones that can handle 20 million uh input tokens awesome but they do get overwhelmed especially choice fatigue is a is a very big thing if you want to do it have it do very specific things make sure it only gets the information that you that it needs to do its task. Quality of input matters but that's for uh for all machine learning projects. Alarms get really expensive really fast. So do keep a uh uh attention to uh the cost and alarms are amazing. I'm I'm using them for almost everything now. But they are not the best solution for uh uh for everything. Keep in mind that you if there's a repetitive task or things that you need
to do in a batch uh uh script um sometimes it is better to write a few lines of batch of Python to um uh to do the task at hand instead of having an expensive model do simple tasks and that is it right on time. Thanks for your attention. [applause] >> Thank you very much. Sherebgo, are you here for the rest of the day to take questions? >> I'll be here the rest of November walking around. I will even be here for dinner, I think. Thank you so much. >> Fantastic. Thanks again. >> So, if you have any questions, come up to me. I'm happy to talk. >> Yeah. So, there's something for everybody and uh everybody's going to be
a little bit out of your wheelhouse. That's part of the the the charm here. If you like if you were very comfortable with the high level strategy from Seaman then this was uh a little bit techy and if you like this then maybe Seaman was a little bit uh too high level. So uh we can we can discuss later but now we have a 15-minute break. So if you ordered a t-shirt go pick it up. If you don't remember go uh check the list, have some coffee, have some snacks and see you back here in 15 minutes.
All
right, welcome back everybody. Hopefully you got some refreshments and snacks. Our next speaker is a PhD in maths and a security researcher at binary security, one of our community sponsors. So, thanks for that. Uh, and that is not connected to the the selection process in any way. Full disclosure. All right, let's give a warm welcome back to Sophia Lingquist. >> [applause] >> Thank you very much. Uh I assume you can hear me right. I will uh speak louder than the people coming in. So do your thing at the back. Right. My name is Sophia. I'm going to talk to you today about serverside request for um a few words about myself. If you want these slides, you can scan this massive
QR code to get the PDF version. Uh for once, the text is really huge, so this will be good. Uh so I work at a company called binary security as Ryan just said uh we are small company with five people who all specialize in penetration testing and security testing and we also do some appsc and other security stuff but yeah mainly testing um I also have a PhD in maths. Uh luckily I came to my senses during my PhD and got a real job afterwards. So uh I worked a few years as a developer and then eventually I made my way over to pentesting which is I've been doing for the last three years now. Uh so the aim for today is the
following. I'm going to tell you about a vulnerability class called serverside request forgery which probably many of you have heard of before. Uh we'll start kind of from the start and I'll tell you about like the basics of this vulnerability and how it works. Then we'll talk a little bit about the impact you can get with this vulnerability and then I'm going to come with just like a bunch of examples of things we've seen out in real tests and like uh real research. Um and then maybe there'll be something interesting there. Who knows? uh and at the end I'll maybe come with some conclusions. Okay, so let's start by describing what this vulnerability is. Uh so on the left
here we have a user or a hacker. That's me. You can recognize the the ha. Um [snorts] I send a request to a web server um which uh that request triggers the server to go and send a secondary request. So maybe has to go and check some resource or whatever. It's doing something else on the back end. Uh where this request goes don't know doesn't matter. The server gets a response and then it sends a response back to me the end user at the end. Now this response may or may not be influenced by the response of that middle request. Doesn't really matter. Uh this the vulnerability here comes into this when the end user
can influence what request the server is making and make it do something it wasn't planning on doing. Okay. So we have some definitions here. Serverside request forgery. That's what I just said. Uh so obviously the server was planning on making some kind of request. This was part of the functionality presumably. Um but the forgery part is that I'm able to make it make a request it didn't want to make. We distinguish between full SSRFs and blind SSRF. So a full SSRF uh basically means that me the attacker has like full insight into I see the whole response from this forged request. Uh and then you have blind uh SSRFs or partially blind uh when I can't
see everything. So maybe I can get like just the status code or maybe nothing at all. I don't even know if the request has happened. Uh with that I will [snorts] show a little demo. So this is an incredibly dumb um app which allows you to paste a URL and then it will download the URL and give it to you. Uh this was vibe coded by Gemini which helpfully made it vulnerable from the get-go. So I didn't need to tweak anything to show you this demo. Uh okay. So to begin with here, let me put in a genuine image link. So this is a fish. It has downloaded it. Let us try. Oh, come on.
Okay. So, what happened here was I put in a legitimate link here. I click download. It goes and fetches the image. Uh going back, let's try to put something which is not an image. Uh so, I'm going to try the binary security about page. Uh I lost the signal, didn't I? Yeah. Cool. Do we have a different like a gong we can try? Oh, meanwhile we can look at the Yeah. Okay. So, what happened is I put in binary security.no/about and I got some badly rendered HTML. Um, the reason for that is if we go and look at the requests which actually were made here. Um, do we find them? Maybe, maybe not. That's my post. Cool. Uh, right. So, I'm
running this through reverse proxy. So we can see the actual requests which are being made. So when I here input the URL as binary security.no uh/about I see that in the response. So I get the HTML from that web page. But I'm actually also getting a content type of text HTML. Uh so this thing is really vulnerable. It's literally just making a request and sending the request back to the end user with like content type and everything. [gasps] Uh so so far we have a crappy website which lets you get things from URLs. Uh but we don't actually like it it's vulnerable I say but we don't actually have any impact. We have to actually do something here to
do something bad. Uh what I mean by that is uh we could for example see what happens if we try to talk to internal services. Uh so let's start by just oh god damn it. Okay I'm switching.
I could probably do like an interpretive dance of the slides instead. That would be better than the rail talk.
Was that better? Okay, we give it a try. Uh it's good at least that something happens. You guys can pay attention. So I'm going to try here is I'm going to input a internal URL. So the server is sending requests, right? So I'm going to try reaching something on the server. Uh so I know for a fact that the web page I'm currently talking with is running on the server and I'm going to guess on port 80 because I'm doing everything of HTTP. So let's see what happens if I try to fetch the page itself. It looks like nothing has happened. That's because it has in fact successfully fetched its own HTML. So, if we go and look at the
request here, um, we didn't drop fonts. That's excellent, but I'm not able to navigate. Okay, cool. Right. So, what happened here was I input this internal IP and the response I got was just the HTML of the website I'm currently looking at, which is why it looks like nothing happened. Uh, so I can successfully talk to internal services. So, now I can hopefully begin to actually do something. Uh, so let's start by just trying uh, port scan. So what I'm going to do is I'm going to try to reach port one on local host. Uh that's going to error out and say connection failed. Uh I can try port two, port three, etc. And we can do a
scan in this way and see if there's uh anything else running here. So in this case, I'm going to skip ahead of me manually inputting every port and not automating it. And I'm going to try port 5000. And now we get a secret admin panel uh also vibe coded as indicated by the rocket ship because no real humans do that. Uh, okay. I can try clicking on something on this page, that's not going to work because now it's doing like relative URLs within this uh internal admin page. But if I really want to go and look at the user Alice here, uh, I can go back and I can input that URL directly in my SSRF and then I'm
actually able to talk to these internal services and maybe do bad stuff like delete a user, create a user, read some secrets or whatever. Um, so this is one way of gaining impact. Okay, I'm able to talk to internal stuff. That could potentially be very bad. Uh what I haven't done yet, but probably should have done was uh to send a request out to look at what this uh uh this server actually is sending. So, what I'm going to do is I'm going to put a uh payload here to my collaborator. I won't be able to type this. Let's see. Okay. And then I get to see here what uh the server actually is sending. So, it's
sending an HTTP request out here somewhere. Let's see. Uh, and here you can see stuff like, so you get the use agent. So you get information about what's actually running on the server. That's nice. And then a classic thing you see is something like the following where there's some like API key, access token, or whatever being included in the server's request. And by sending out to me, I can just steal the secret. And that's cool, I guess. Um, let's see. Was there anything else we wanted to do? Uh so another thing we can do in this case because it's just mirroring the content type of what it gets is I can construct a very convoluted XSS. Uh so here I have
this other site bins.cloud which is hosting a basically an XSS payload. So it gives you an alert box which says hello from XSS and then it prints the document.lo location. So now it's coming from NHS MHS.bins.cloud. Whereas if I input that same URL here in my SS SSRF page, uh then it will also serve this uh JavaScript. And here you can see that document location is this other IP I'm I'm working on. You maybe can't see it, but it is. Uh so point being I'm now running JavaScript in the context on this site. So if there's anything else sensitive here, I could now also XSS people. You need to set up some stuff because it's a post request, but like
you can figure it out. Um there's another thing which is potentially very vulnerable in when things run in the cloud uh which is that cloud services have these internal metadata endpoints on these special IPs [snorts] uh which let you get information about the like cloud resource which you are currently in or it's it's for itself. So like if I have a VM in Azure which needs to access other Azure resources, the way it does that is it goes and talks to this special IP and gets an access token and then it can communicate onwards. So being to able to do this via an SSRF could potentially be a disaster. Um the cloud providers have all like tried to
prevent this because it was so bad uh by requiring that you have to have like a special header present. So very often in your serverside request forgeries you won't necessarily have control over HTTP headers. Uh so in this case I can try one of these cloud metadata endpoints. Uh that's going to fail uh the 400 client errors precisely because I'm missing the correct headers here. Um I can't remember if I have examples later where we do have the headers. Uh but that doesn't really matter the case. The point stands I guess. Okay. So that was an example. Let's now summarize a bit what kind of impacts we've seen examples of so far. So you have an SSRF. What can
you do with that? You can at least in that case you could uh do an IP and port scan of internal systems. Okay. So in this case I just checked what uh ports are running. Is there anything interesting? I could also have scanned internal IPs which I can't reach from my position but could have reached from wherever this server lives. So let's say you are living in a Kubernetes cluster or something. You can begin talking to other resources which I shouldn't be able to talk to from the internet. Um kind of an edge case. So, this is not something you do when you do penetration testing, but you could technically proxy all your traffic through this vulnerable
site and then go and attack other sites. Um, a bit of an edge case. Uh, you could potentially be able to access unauthorized data or actions. So, that we saw an example of when we found this secret admin page. You can start doing stuff there, which I definitely was not supposed to be able to do via my image downloader. Uh, you can maybe yeah, access hidden backend system. That's kind of the same thing. Leak secret headers. You saw a dumb example of that which I had obviously constructed but we'll see real examples of that later. Cross- site scripting we saw uh one example of how can happen. Um the cloud metadata endpoint we spoke about. Now
reading local files and command execution I'll give you an example of in a second. Uh so let's jump into that. So this is a real example from a test. Uh I could have taken basically the same example from any number of tests. uh because this is a setup one sees again and again which is you have a server which wants to generate a PDF. Now the way one often does this is you have some HTML you give it to some kind of headless browser to render the HTML and then you make a PDF from that but rendering HTML server side now in the browser uh means that like what would be normally just an HTML injection or an
XSS becomes an SSRF or something on the server side. Okay. Okay, so in this particular case, there was an example where the user uh on this web page fills in a form with some information like answers a questionnaire or whatever and then you send it in and the server generates a PDF with your responses. Uh the request looked something like the following. So uh we have a list of questions here in the body. Uh I realize now that things far down. You probably actually can't see, but it says that there's uh a single question in the example list with a title and an answer. So in this case I have successfully answered what's your name with Sophia
with a lowercase s. Uh right the point here uh is that the thing I actually filled in in the UI was the answer to the question aka my name. Um the first thing you would try when testing this is what kind of things can you inject in this parameter. So can I like uh inject some kind of weird HTML payloads or whatever here weird uh characters quoting whatever. This was surprisingly actually pretty well sanitized. So it wasn't possible to do much via this actual input. Things were escaped, etc. However, it's a bit odd that the title of the question is being sent in my response when that's not something I really am in control of. Okay, so if
things are done right, the server should use this title as like a static lookup. Okay, it's for that question instead of an ID as a name kind of. Uh what happened in reality here was that in fact I could just modify the title in the request uh and make it so that I'm answering arbitrary questions. Not only could I change it, there was no proper sanitation happening here because it was kind of an assumption that this is a static string made by the server. Uh so here's an example where we've modified the title with a uh HTML script tag which fetches some JavaScript from my server. Um sending this in resulted in a get request being sent to get that
JavaScript resource and indeed it was rendering any HTML I sent in was in the PDF I got back out. Uh, one thing we saw on this request being sent is that the use agent is [snorts] at the very bottom here. You can't see it, but it says use agent headless Chrome uh, version 89. Uh, so that's telling you how this HTML rendering to make a PDF is actually happening. Uh, we'll get back to that in a second. Okay. So, one thing you can try here since we are injecting I mean to be honest, we're doing more than just an SSRF here. We are really just running arbitrary JavaScript and HTML on this server. Um so in this case we can for
example input an iframe where the source is a local file that worked in this case. So we could read for example etc password or any other local file on the system. Uh as a side note it is really painful to excfiltrate stuff via PDF because you can't get the text out of the PDF. So you need to do like some kind of text recognition and then it's a bit wrong and you have to tweak to actually get the keys out and things. Um so that works. Um, you can also do an internal port scan like we were doing in the first stupid demo. Uh, so here we just scanned everything and it turned out we could have guessed this already
without scanning that on port 9,222 there was something interesting happening. Um, the reason for that uh for those who know anything about Chrome is that Chrome has a remote debugger port which runs on this. Uh, that being open uh is obviously a disaster if you are internet exposed in any way. Now this is internally on the server so it's like doesn't matter. Um, so here's an example where we send a valid request to one of the APIs in this Chrome debugger port. In this case, just getting the version information. Uh, and that gives you then a PDF back with like the version nicely printed in your PDF. So this works. So we can talk to internal
services. We know that's something interesting running. Uh, we could also read all local files like this is already a pretty serious vulnerability. Uh and then just as a like dumb end to this, the version of Chrome that was running which we got from the use agent agent and also from this version uh output is of course absolutely ancient and has a known CV with known exploit code which we could just run and it gave us reverse code execution on the server. Uh so that's an example of how like you can start with an SSRF and end up with just like complete uh code execution at the end of it. Cool. Uh now in the case you just saw with the
PDF generator uh it's kind of I guess a bit clear how you should fix things. So you should not uh let the end user modify the title and input arbitrary HTML and JavaScript. Um but you can imagine there are cases where you really do want to make this request right. Uh it there was no reason for that PDF to start making external requests. But for example in the image downloader which also doesn't need to exist uh I really was intending to make an external request. So the things we very often see is that people try to block uh that you cannot access kind of uh dangerous things. So like internal IPs, cloud metadata endpoints etc. Uh and this is
very often done via some form of deny listing whether that be like a static list of you can't talk to these IPs or if you match this reg x you can't or you can if you match this reg x etc. Uh and there's just an incredible amount of examples of this failing. So I'll just mention some here instead of showing an endless stream of the same example. Uh so for example trying to deny list specific IPs. So like you don't want the server to make requests to local host let's say. So you say like don't talk to local host or 127.0.0.1 but then you forget to account for stuff like different IP representations. Uh so
there's many ways you can write this IP. We'll see we'll actually see an example of that in a second. Uh, another one is that you check that this is not an internal IP, but if it's an external host name, you don't do the DNS lookup. So, I can just have an external domain uh where the DNS record points to 127.0.1 and then that bypasses the protections. Uh, we also in the case of this cloud metadata stuff, it's of course very important being able to smuggle the correct header in to actually get stuff. Uh, that we've seen examples of like trying to deny list the specific header you need. So like in Azure you need a
header named metadata with a value of true. In Google you need a slightly different one. Uh so trying to like blacklist or deny list you can't have a header named metadata but then if you have a header named space metadata it goes through and that is passed so that it ends up being the right thing on the back end. Um, and then things like doing proper validation of the IP and also checking the host, checking DNS records, being like really proper about your IP validation, but then you allow the server to follow redirects and your validation just completely breaks because you start with a nice thing and then redirect to the bad thing. Uh, so
here's a very short example which has absolutely tiny writing. So I'll try to zoom a bit. This is borrowed from my colleague Christ Hansen who has an infinite supply of good bugs from Google. So whenever you need something Oh, you just ask him. Let's see if I manage this. Is that how you zoom? It was okay. Cool. Uh right. So what's happening here is he's talking to this uh Google API uh thingy. Doesn't really matter what it is. Um where there is this input which is some kind of URL which the server is going to do something with and then there's a bunch of deny listing done here. So you aren't allowed to talk to internal services. Uh
however if instead of writing like localhost or some other form of the uh local host you can use the syntax zero uh which is shorthand for 0000 which is shorthand for local host. Um and then it went through and he managed to talk to this endpoint and got like seriously sensitive stuff. Uh so that was a very short example but I thought it was fun. Oh no I don't know how to zoom. Okay we're good. Uh cool. Um, right. I attempted to zoom, realized that didn't work. Uh, moving on. Okay, here's a whole bunch of examples. Uh, I promise I won't only be talking about Azure DevOps. Uh, but it turns out that Azure DevOps is full of bugs. Uh, also
it's nice to talk about Azure DevOps because I I can talk about it. I don't need to censor things. Uh so the examples I'm going to talk about now are come from originally researched by my colleague Todd who found three serverside request forgeries in Azure DevOps which you can read about in this first blog post and then I did some follow-up research and found another couple. So I'll talk about four of these five but uh as you'll see there's some similarities going on here as we progress. So all of this starts with Azure DevOps which is like for those who haven't used it it's like a bad GitHub. Uh am I allowed to say that Microsoft
themselves are phasing it out right? It's okay. I can say that. Uh, [snorts] cool. So, there's this functionality that you can add something called a service connection. So, that's when you want your uh DevOps to interact with other services. So, in this case, we are trying to set up a connection to Azure. So, you want DevOps to do something in Azure. And uh so for the first example here, uh Toyus noted that one of the requests being sent when he went through this flow was to this endpoint named service endpoint endpoint proxy. I'll just refer to that as endpoint proxy going on which has a URL parameter. Uh so you probably can't see in this
request but there's this long JSON body and at some point there's a parameter called URL and as a hacker you start by testing what happens if I put uh my payload in there. Will the server send a request out there? And indeed when he put his collaborator URL into this URL parameter um it resulted in a ping to his server and in that request you actually see exactly this example I'd constructed where you have this access token being included in the request out. So he's leaking an access token from the server because it was planning on sending this to the Azure thingy you were trying to set up. In this case it actually turned out that this token
doesn't give all that much. So it's not really that interesting. Uh so to moved on. Oh yeah, also uh the response you actually get from the request is uh sorry I couldn't pass the JSON. So this is a hint that uh when you do your SSRF you you'll be able to read the forged requests answer provided it is JSON. That will be important later. Uh right so since this leaked access token didn't do all that much uh to tries to escalate and the natural thing to do when you are dealing with these cloud things is to talk to this cloud metadata endpoint. So attempting to input this internal URL not internal special purpose URL uh gives an error
message saying that you are not allowed to talk to this IP uh and you get 500 response. The same happens if you try to talk to local host and this also applied if you try tried all these tricks of like obuscating the IP or like having a host pointing to it. Uh it did not follow redirects like this sanitation presumably was being done properly. Um there's other things we can try. Yeah. So we try does it follow redirects? uh can we have a DNS record pointing to something internal? Uh this didn't work, but it turned out that in this massive JSON request, which you didn't actually see the whole of, there's a second URL parameter going on
uh which was empty into this initial request. Uh so, of course, he attempts to uh do stuff with it. Now, I'm going to fast forward here because there's a bunch of discovery and weird stuff and strange error messages hinting along what you need to do. It's like this nice task of like your it has to be of this format and if it's of this format, it has to match that URL. It turns out if you set it to this special purpose string, so it looks like some kind of templating string uh curly bracket curly bracket configuration URL then you end up with a request being made to that original URL with setup except this time
all the validation disappears. Uh so what's happening here is that Azure DevOps is making one request in one place and if you hit a slightly different code flow it's making another request and that's doing its own validation basically. uh that results in requests being made to these special purpose IPs with no restrictions and indeed this is a pretty bad bug which Thudus reported and uh yeah here you see an example of the final exploit where uh as a side note he's using a metadata endpoint here which does not require this uh metadata true header which is very very nice when you're trying to prove impact for these things uh because then you can show that you genuinely are
talking to internal services and getting responses back. Um, so the next part of the story here is that Microsoft fixes the endpoint proxy bug. So when Toyus tries the same exploit once more, he now gets an error message which is long and hopeless, but is basically saying, "No, you're you're not allowed to do this." Um, now we are going to talk about a technique known as DNS rebinding. So let's assume that the fix looks something like the following. So I've added in where I'm making this request which used to be allowed to go anywhere, I add in a check, is this a forbidden IP? And let's assume that I do this properly. I check the DNS. I check all
formats of the IP. Blah blah blah. If it's forbidden, then I'm going to say nope, not allowed. If it's not forbidden, then I'm going to continue and actually make the request. Uh the problem here is that we have a race condition. Uh or well, we we have an issue where we are using the host in both places. Uh but um we we we aren't guaranteed that the host is going to point to the same thing in both checks here. So what DNS rebinding is is the following. I provide a host here uh which resolves randomly to one of two IPs. Uh so we do this we set it up with a very short time to live so that every
time the server talks to this host name it has to do a new DN D D D D D D D D D D D D D D D D D D D DNS lookup and check what IP are you now? Uh so then half the time it will resolve to a benign IP something like in this case one 0 0.1 which is whatever and half the time it's going to resolve to this internal cloud metadata uh IP 169254 169254 and then naively what's going to happen here if I provide this host name is that for the first check which is is this IP allowed half the time the DNS name is record is going to point to something
which is allowed and the check is going to say yeah sure keep going and then half the time on the second check if you've made it there it will point to the add IP and you'll actually get the internal data. Uh so Toy just tried this and indeed it worked and he just spam the same request a few times and then eventually it goes through because you hit both the conditions and we have another bug uh another bounty uh and Microsoft fixes it again. So that brings us to example number six uh endpoint proxy uh same endpoint we just spoke about. So surely we are out of SSRF vulnerabilities and endpoint proxy at this point. Uh now I kind of glossed
over this but we had this funny like templating like syntax originally um which it's kind of is tempting to look more into because uh if you have something like template injection this is a really bad bug. Um uh so it would be interesting to see exactly what's going on here. That was that was heavy restrictions on that. It like had to start with this specific string and things. So that was kind of limiting what you managed to do just by uh I guess fuzzing this. Uh luckily for us uh one of the reasons also it's fun to hack on Azure DevOps is that in addition to the cloud version of Azure DevOps which is I think known as Azure DevOps
services you have an on-prem version known as as a DevOps server. Um because this runs net the on-remise version you can basically just decompile and get the source code from and so now we have the source code of Azure DevOps server. Of course there's no real guarantee that it's the same thing as the cloud version but if you look at the code it definitely is the same thing because there's a bunch of checks like am I in the cloud or not? Uh so we now have code and can go and see what's up with this templating stuff. Uh so it turns out what's going on here is that this is being evaluated as something called a
mustache template template which is uh amusingly this is a templating engine which was developed to avoid uh like template injections. Uh so it's supposed to be a logicless templating system which is safe against template injection and what it does is just really basic like variable replacement. Uh so there's some screenshots here of the actual code which was running and an example of how it would look. So in this case it just goes and checks in addict is that variable name configuration and is that variable named URL and then it just puts in the value and it does not do any fancy logic uh other than that. Uh I say that of course uh [snorts] Azure or Microsoft have their own
implementation of how to evaluate musters templates which has a handy step where it adds in a bunch of helpers. So that's functions you're allowed to call within the uh templating syntax that does not call functions. Uh so there's a long long list of these helper methods. You can see some of them on the right there. Uh and a another colleague of mine Christ found one of these which was interesting. Uh get file contact content it was named. Uh so I decided to look a bit closer at that. The reason is because this takes a URL. Um, and indeed here is an example of a request where I correctly use that templating function helper method thingy. Doesn't matter how
it goes. I've input two different URLs. I have the original URL, the one which Toy Stoius was doing SSRFs in. And then in this second parameter, I have a second URL um going in and both of these are set to two different of my like collaborator payloads. And the first thing I got was an error message saying that these have to be the same. Okay, so I'm making this outer request which Microsoft now presumably properly sanitizes and doesn't do anything bad. I'm trying to do an inner request via templating syntax, but it has to go to the same place as the outer one. So that's a bit sad because the outer one is being checked.
However, uh let's see. Yeah, so here's what happened. If I send the actual same host on both, then it does succeed except it says I expected JSON. So again, I can read any responses which are JSON. Um so this is what it actually looks like if I if I put this collaborator payload in both places. First I get an inner request um which is coming from this uh templating helper and then I get an outer request which is where the path is the response from the first one. Okay. So I I get a thing where I manage to get out the response from the first thing regardless of what it is because it ends up in the path of the second
one. But again both of these have to go to the same server. So it's not like I can talk to internal endpoints or anything like that yet. Uh so I said at some point right so here's just a summary I think of what I just said. Uh the outer host is checked for forbidden IPs and it's also no longer vulnerable to DNS rebinding. They did fix that in the end. Uh turns out that the inner request follows redirects. Uh so kind of the first basic check you needed in place which they had on all these outer requests was missing on the inner one. So what we do is we uh give a external host name which what it does is just it
returns a redirect to the cloud metadata endpoint. Uh this passes the check because it's an internal endpoint. The like templating request goes and follows the redirect and accesses this internal cloud metadata endpoint and then the response of that is leaked in the path to the outer one. And we have another serverside request forgery in Azure DevOps's endpoint proxy endpoint. Uh now things begin to be funny uh because Microsoft of course fixes this. Anyone want to guess how you could bypass the fix uh DNS rebinding worked again. Uh so the lesson here is that in Azure DevOps every single request is a unique snowflake and you fix it separately. We do not use the same mitigations in more
than one place. Uh if anyone wants to learn about serverside request forgery, I recommend going and looking at Azure DevOps. There's still like 50 more of them, but sadly the impact isn't high enough that Microsoft wants to do anything. Uh but you can find them as a fun exercise. Okay. Uh I have one more example here towards the end. This one I've again stolen from Kristen because as I said he has infinite supplies of Google bugs. Um let's see. I'm going to show a video of that. I think maybe if I manage this. Okay. So what Kristian has here is a again in this uh Google API platform thingy [snorts] there's a function where what you do is you input a uh uh
basically you are defining an API from a swagger file uh and this can go and fetch the swagger file from external you give it a URL in this case he's given some IP swaggery the server is going to go and uh talk it's going to fetch that URL and then it's going to do some stuff the first thing you try here is of course to do an SSRF via this URL we have a URL parameter. Life is good. Um, turns out the like um things are checked properly here. It doesn't let you talk to internal things. It also doesn't really give you the response. Uh, things are tricky. Uh, but what's very cool in this example, we'll see. Oh, help.
Um, let's see. We shall uh right. So the URL Kitty Stone has put there points to his server uh on which he's hosting our valid swagger definition. Um so let's pause that and see if I managed to zoom again. Uh so the swagger definition uh is is valid. It has to be valid for you to get anywhere. But what he discovered was that uh when defining an endpoint test, we have slash test. You can use this dollar ref syntax. And what that says is just go get this definition from a different swagger file. So it's it's the way you reuse stuff. And here on this one, he could now input a internal IP. Uh and that was no check on this one. So
by doing that, what he got was uh the following. The first request does presumably legitimate stuff. The server now goes and gets this dollar ref swagger definition from within the main swagger definition. He then got an error message because this is of course not pointing to a swagger definition. This is some internal Google uh uh endpoint which does scary stuff. So there's an error message here. basically saying no, I'm not able to pause the contents of this internal IP. Uh and then it very helpfully tells you why it's not able to pause it by telling you what the output actually is. So you are able to leak the response from any internal endpoints in this way, which again is a pretty severe
bug. Okay. Um let's see what else I have. Yeah, so the point here in this one is that you have an initial SSRF which is blind and also properly checked. But then you have a second order serverside request forgery in the swagger definition which is not blind and also has no restrictions. Uh so this is a pretty cool example which shows that things can begin to get pretty complicated here. There's a lot of validation to be doing uh keeping track of all these requests which are going around. So finally I'd like to try to summarize a bit and talk a bit about why we think these SSRFs keep happening. So it's not like this is a new vulnerability. It's
one of the ones which like everyone has heard of. you have SQL, you have XSS, you have SSRF, you have all these funny acronyms. Um, presumably some of these vulnerabilities should be like fixed at some point. So like if you use a proper framework, you shouldn't be vulnerable to SQL injection. Who knows if that's true. Uh, so like if I use a proper framework, I shouldn't be vulnerable to serverside request forgery maybe. Um, I think the reason why this is not true, so the point I'm trying to make is that we keep seeing SSRF vulnerabilities again and again and they keep being more and more serious because there's more and more like interesting cloud stuff
going on on the inside. Um, the problem is that fundamentally you want to make this request. You have functionality where you do want your server to talk out. So you can't just use something and be like block all requests from my server because then you've broken your actual functionality. Uh, so as we all know the safest server you can have is one which is not plugged into the wall. this doesn't work in this case. Um, in general for these things, trying to allow list instead of deny listing will work better. So, I didn't show many example. I didn't show any examples of this, but we've also seen a bunch of broken allow listing where you like try
to match on the URL has to start with something and then you're able to do weird stuff or it has to end with something or whatever. Um, but deny listing is like extra hard because it kind of relies on you uh having a complete list of everything which is bad and people always figure out new stuff, right? So you have to be very consistent when doing these things. Things to think about uh are you following redirects? Uh do you check for DNS rebinding? Um if that is any sort of custom headers allowed, you better be very careful about what you're allowing here. Uh also content types like in my original example with the image download thingy.
There's really no reason why it should return anything which isn't like a JPEG or a PNG. Um you should do proper checks on this. Make sure that the things you are getting is really what you expect. Um, if you have a website which is trying to get swagger definitions, check you're getting swagger definitions. Okay, that wasn't enough in that case. Uh, also firewalling like in general, if you have something running in a Kubernetes cluster, in some Azure VNA or whatever, um, you probably know what your server is supposed to be talking to. Uh, so it's difficult to block the internal stuff because that's often intended, but like there's no reason why it's supposed to be pinging out to my
collaborator server if the intended functionality is that it should get a backend resource, for example. uh so you can lock down a lot of things that and make this a lot harder to exploit and then you kind of okay maybe you end up with a situation where uh you have like some tiny impact but like in reality you can't really do anything and then you just have to kind of live with that. Uh so what was the moral of this? I don't know. There's a lot of SSRFs around the place. Uh they keep happening. Be aware of it. It's going to keep happening. Uh if you want to reach me there's some links and uh thank you very much for
listening. [applause]
Thank you so much. We still have a few minutes. If anybody has questions, questions about the SSRF, come for the demo. Stay for the hot takes. [laughter] >> No. Um, all right then. Give it up for Sophia. Are you around for questions? Yeah. Okay. You can ask your questions uh in private as well. Thank you, Sophia. >> Thank you very much. [applause]
>> All right. Up next, we have Ireina Ayanova. Uh she has experience as a software developer at a security consulting firm. She's currently doing her masters in web framework security. You see that segue there? We're still talking about web frameworks. Welcome to BIDES. Give a warm welcome to Arena, everybody. [applause]
And we have plenty of seats in the front here. So there's no need to stand in the back. If anybody wants a seat, there are seats available. All right, take it away Arena.
Can you hear me? Okay. Hello everybody. I'm very excited to be here today and uh I'm going to present to you an open source project that I started to work on several months ago. It is called the unsafe code lab and uh let's just look what is it about. But a little bit about myself first. My name is Arena. I'm actually uh in my final year of bachelor's of computer science at master university. I have an experience as security engineer and uh recently I joined the Dutch Institute of Vulnerability Disclosure. Yeah. and I just try to be nice. [laughter] Uh, also I'm a first- time presenter, so I'm super nervous. Um, I just want to tell you about my
typical week as an appseac engineer. So on Monday, I receive a quote in Flask, a modern Python web framework, which says that uh Oh my god. Okay. [sighs] So here we see the example how we can retrieve the user ID using the built-in method in flask. Next day I received the ExpressJS function uh ExpressJS microser that shows a pretty similar pattern here. And finally fast API does the same thing but looks a bit different. So the natural questions that comes to my mind is that um they all look very similar. So is the thread modeling the same? And a quick uh show of hands. Is there any penion tester here? Can you show someone? Nice. Uh what about bug bounders?
Super. So some of you can actually recognize the first vulnerability here. It is the common HTTP parameter pollution vulnerability. And um the question is so what happens if I found this vulnerability in one of the examples I showed you before? Uh will the same vulnerability be present in other services or it will be similar? And the answer is it depends. [sighs] Actually it is known that in flask and expressjs version 4 uh if you write something similar um it might lead to the confusion vulnerability but in fast API because it is safe by default it is almost impossible to make confusion um attacks and um yeah a little introduction. So in my experience when I started to work with different
codes I was very confused because you know developers they like to use the most shiny and new and like whatever libraries and uh every time I received a new code it was different language different frameworks and I was just so confused I didn't know where do I even start and uh the options that I found for myself was first of all capture the flag competitions uh very nice I've learned a lot also recommend you if you never played Um but they have their own limitations. The main one is that uh capture the flag uh rewards the finding the flag not not understanding why the vulnerability is present and for surely not how to fix it.
Another one is blog posts and there are so many nice and very important blog posts there but they also have their problems such as um some frameworks are under represented in uh yeah in the internet there are updated versions different code styles and it's just so much overhead for you to go over and research research research. So we're thinking hey what if we just had uh one platform that has realistic and runnable vulnerable code that is built for modern and uh popular web frameworks and uh that's how I created the unsafe code lab. So it is built for absec engineers, researchers, students and developers. Why not um just to see how the vulnerabilities are working and um I
want to tell you how it works by showing the case study. We're going to talk about source confusion uh vulnerabilities and okay here we have the easiest example. So please pay attention otherwise you're going to be lost. Uh you know what low okay let's not do that. [snorts] [laughter] Let's try to understand what's source confusion first. So, I'm going to use the analogy of the shell game. And if you're not familiar with what it is, basically you have three cups and a ball. And your goal is to um is to keep track where the ball is. And in this analogy, the ball represents the user data such as user ID and uh that can come from different sources
such as body form um query string um pass argument which are represented as cups. So um imagine you see the request that uh that uh passes user ID from the body form. In other words, you see that the ball is under the yellow cup. It is natural to assume that no matter how you shuffle these cups, the ball is going to be there. It's going to be under the yellow cup. But however, it's not always the case. Especially if you've got scammed once with this game. You may see that uh what happens if there are multiple balls. So basically, what happens if an application is actually happy to receive um to receive user ID from other sources such as query string.
So something was mentioned in the previous talk about this a bit and uh I'll just try to to explain to you in other words. So if you haven't watched Spongebob why uh [laughter] [gasps] but okay imagine we want to access Spongebob messages we can do it with the uh pass argument. So we're just taking the Spongebob messages but what if I'm going to pass Squidward's username as well? What's going to happen? So it's already uh how to say there is already potential for the confusion vulnerability and it's up to the application to deal with that problems. But there are more complicated things because we of course can also pass Mr. Krab's username in the body.
And now you see that there are at least three sources where we can take the username from. So um I want to make I want to make it clear that unsafe code lab is um not about vulnerabilities in frameworks but we use frameworks to show how this vulnerability can be present. And now I'm going to use flask as uh as our example. So here we have Squidward. He wants to access his own messages. We have a request where we take his credential and pass it via the body uh forum and we just access his messages. But recently Squidward found out that if he passes Spongebob's username, he's going to actually not see his own messages but Spongebobs. And uh this
vulnerability can be present in many ways. And uh one of the example is as follows. So in this method when we are returning the uh messages we are taking the request. method which returns the data from the query string or we take the data from the body. So basically we return messages from the body only if the query string is absent and I want to make it clear that this is not the vulnerability itself it is perfectly clear to write that the only problem is that when we have a confusion because when we authenticate the user we use a body but when we receive the message we use something else. uh and safe code lab is built in such a
way that you say arena this is very simple like nobody is making such mistakes and you're crazy and I say okay my friend but but listen to me uh unsafe love is [clears throat] built in such a way that everyone with any background even even students like me can come and understand from the very beginning and each example is beat is built one upon another so the what we have here is that um it's basically the same vulnerability but it is written in a bit different way. So in flask uh you already seen this method uh in previous slides. We have request valvalues uh method. It is basically a um combined dictionary of two sources. It combines uh from the
query string and from the body and itizes form uh query string over the body form and it is basically the same what we've seen before. But the trick here is that how many people know about this? Maybe if you work in flask a lot, you know about this. But what if you are not as experienced? What if you review the code and you've never seen it before? Because of this build complexity. It's not only the vulnerability itself. It's a framework design that not many uh not everyone is um how to say used to, right? Uh so let's see more examples just to make uh to make it a bit clearer. Uh imagine we want to steal the cr formula
and uh we want to bribe Squidward but he wants to have all the money now and the bank actually allows only to send 50 bucks and it doesn't work for us. So when we try to send more than 50 bucks yeah tough luck. uh but if the uh if the vulnerability is present what we can do for example uh we can send the repeated uh query parameter with different values. So when we so the vulnerability is present such as when we check what amount is sent we check the first parameter but when we send we send the last uh parameter and uh with this I just want to show you that it can happen in different sources but it also can
happen in one source. So in a query string just with different sequence. Okay, let's return to this large mysterious example and I don't expect you to to read this code but uh what it uh basically does. We retrieve the messages of the group which the user is a member of. And uh again we see a very similar pattern when we check the group membership that you are allowed to read these uh messages. We see something that we've seen previously, right? But now we are taking the um the query string and the pass. Um this is not vulnerable but because in the main meth in the main function we are retrieving only the group from the
pass it makes it u it makes it vulnerable and I just want to uh make it uh make it clear that this examples when we when we just started it was very easy and uh I would say obvious like okay just don't do that but when we go to the more realistic code you see in flask it's very it's very uh popular to use decorators and when you will review your code you can just see this um this function and to pinpoint that oh okay where do I need to look you're you have to look in different files in different uh directories and that makes it a bit more unclear [sighs] so let's just go over the
example okay panon tries to uh access cr messages and okay not today buddy but if he just um adds his own group that he is a member of he is going to access the cricy crap messages. Why it happens? Because again when we check the membership we are checking the uh query parameter. Uh but when we check when we retrieve the messages we're only doing the uh pass. Um so you may ask me okay flask do the trick and what and I say okay it's not just the flask it's actually uh found out I found out that uh this exact vulnerability or class of vulnerabilities can be found in cherrypi expressjs version 4 ruben rails php and
many more and u for some frameworks it is much easier to make mistakes simply because As we seen in flask, there are methods that allow you to make your code, you know, more convenient and easier to write, but it also how say it creates a foundation for mistakes and u security flaws. But some frameworks got it right and uh so unsafe code lab basically wants to take these vulnerabilities and check how they are present in different frameworks. And uh for example, we have Django and Express.js. They also had the uh this um merged uh merge method but they removed it in uh latest uh releases. So it's a good example, right? Um since I said
that uh let's learn how to fix them. I'll show you I'll tell you how I would fix the or prevent the vulnerability from happen. First of all uh single source of truth. Um always use this method. uh try not to use the ambiguous methods that uh allow you to have uh several uh sources but if you cannot cannot ignore it you have to use it be consistent across all your platforms so I just want to wrap it up this is the first release of the open source for now uh me and my uh and the other contributor we've covered flask flask examples and there are many more to come we want to cover as much as possible We
want to make it very accessible for new people to come to ABSAC. We want to have developers to learn more about security. We want to we just want to make uh community a bit better, you know. So uh if you if it sounds like something you would like to contribute to or you're interested, you can contribute to our GitHub. You can ask questions. Please spread the word. Email me if you have questions. Email me. Email me if you have critic and you're like, you know what, it sucks. It's like doesn't work. I would love to hear that. And uh yeah, I'm also open for job because I'm a student. I'm jobless. [laughter] So if it if it's something that uh you
like. So yeah, please email me and thank you for having me today. [applause]
>> Thank you so much, Arena. First time speaker. We we wouldn't even have known if you hadn't told us. So really yeah [laughter] fantastic. Thank you so much and well done. >> Thank you. >> Uh does anyone have any questions for me about uh web framework security or the the tool unsafe labs? >> Why did you start with flask? >> Uh I started last spring I think and I just started to work with it and I was like okay it's something interesting to to look into. with Flask specifically. Uh why Flask first? Um >> to be honest, I just liked it more. It's easier to start compared to Junga. Like the setup is easier. And uh when I found
something to catch on some on the on the like source confusion, I was like, "Okay, I need to look into it more." But there are more vulnerabilities, not only source confusion. It's just the most interesting that I wanted to to talk about. >> Great. Well, thank you so much. We made a donation on your behalf to mental health for youth in Norway. >> Thank you. >> And give it up for Arena. [applause]
>> All right, we've been through four talks. We've done a little red team, a little blue team, a little uh global geopolitics and uh subtrafuge. Uh and it's not even lunchtime yet. So, lots of demos. Um, thank you so much to our first four speakers of the day. It was uh really fantastic. And to all of you for joining us and participating in our sponsors, Nemonic, Promon, Defendable, for making it happen. We have a lockpick village downstairs backed by popular demand with twice as much space as last year. So, there will be prizes, I should say, now. So, uh, lunch is a good time to, uh, get down there and try to pick a few locks. Uh, I
think we have a prize for new lock picker. Never picked before, but turns out you're a natural. Maybe a career change in your future after this event. Yeah. And prize for, uh, experienced picker, uh, let's say. So, check out the lockpick village. Pick up your t-shirt. If you pre-ordered, there's still a ton of t-shirts down there. Somebody paid for them already. Some people sitting out here presumably. So, go get your t-shirts and then we will have uh some extra shirts for sale as well. So, that's it. Uh thanks for a great morning and enjoy your lunch.
Welcome back everybody. Give you a moment or two more to find uh find your seats. There are lots of seats as usual up at the front. This is not a standup show. No one is going to roast you. Uh if you sit in the front row, it'll be fine. I promise. About the shirts, uh we have we have a lot of shirts downstairs uh that were pre-ordered. Um and the the people who ordered them are here because we checked with the badges. We cross referenced enhance. We cross referenced the badges that were not picked up with the names of people who had shirts and almost everyone who you own those shirts down there. You just need to go claim them. But other
people also want shirts. So, if you haven't uh picked up your shirt by the end of the first break, sorry, the beginning of the second break, then uh we will sell the shirts. So then we're we are um giving the people who want shirts shirts and you who have changed your mind uh [laughter] will no longer be burdened with this uh this caporeal possession. So, check your ticket your ticket if you don't remember uh check your ticket in your in your inbox, your email, um and see it should say your shirt size if you if you ordered a shirt. So, please pick up your shirts at the earliest convenience. And then enough of that shirt talk. Uh excited to
announce our first speaker after lunch. We are zooming out a little again uh to um the big picture stuff right uh after lunch. So we have uh Michael Marovich. Yeah. Uh who started out as a system administrator in a university. He has been a CISO of a a NASDAQ company and a bunch of stuff in between. So I'll let him take it away. Please give a warm welcome for Michael. [applause] >> Thank you. Uh hi everyone. Um my name is Michael and uh thanks Ryan for introduction. Uh I've been wearing quite a lot of hats and uh it's a true pleasure to be here on this stage uh after as I mentioned applying for this
conference for several times. So it's really cool to be here. So um I've been doing security for more than 25 years and uh I started as a Ryan mentioned as a system administrator but they've do a lot did a lot of security engineering. I've been a security manager and the NASDAQ traded company. It is based out of also Norway. So this has quite a tight connection to to the city and uh I've been also a startup adviser and a eventually a founder uh helping a lot of students to find their way in cyber security and uh beyond that I'm also an open source security engineer and researcher. Um I this talk came after some conversations with my peers and uh
we'll start with a small memory of the last year. I think it was July 2024. I made this picture in at Forub store center. It's a shopping mall here in Oslo. And uh I was going to the pharmacy to buy some medicine and um there was a sign that our systems are down. They're not working and uh literally please come back later. And this happened uh in many many shops and facilities, many many organizations across the country and in many other countries. Do you immediately recall what was the reason for that? Yes. Okay. And uh for those who don't know it is a crowd strike uh incident which affected quite a lot of companies and after that I I I had a beer with
with my uh friend who is also a security officer here in Norway. He asked me so um we all read these data breach reports. We read all these kind of security incident reports. we see all these outages happening and uh how can we kind of be prepared for for that? uh how can we do something in order to measure uh what we here in Norway can do specifically like tomorrow in the next month in the next two three months uh in order to be a bit more ready for that so that our user users customers our people can't um just come to the pharmacy see this kind of things and come back maybe next day
maybe in two days and so on and so on and there were several data breaches that are quite known, Noskhedro, some others, Visma that were mentioned today. Um, but if you look at the whole statistics and trying to get to the root causes of what why these things happened, we don't have this kind of a countrywide or Scandinavia wide statistics. So, in order to get some information or get some kind of a usable results, we need to use international sources, US-based sources. And uh that's uh that's why I started thinking what kind of information we can get here that can be useful in planning and maintaining our security effort. And uh then I decided why not to look into the
public reports of the authorities. Uh I uh I read some of the annual reports from the police. I read some of the annual reports from the data supervisor authority here in Norway and they were not very helpful because they're too generic and they didn't give any information to action any actionable information anything that I could immediately tell my friends okay I think that the this type of the attack has high probability and this kind of impact and you should probably check if you're prepared for that and uh then I was looking for some security breach information and I realized that we have an excellent article 33 of GDPR which says that every organization that have ever suffered a
data breach should report the relevant authority that is in Norway um in in 72 hours or faster. And I thought that this is an excellent example and uh should I look into these reports and uh these reports according to GDPR itself they should include quite a lot of useful information as I thought describe what has happened describe what was the damage of the incident and describe what kind of measures were taken or will be taken by the company by the data processor in order to stigate the bridge. So this is the law. So this is that we can't get out of without any kind of consequences. And I was assuming that the companies do that and this is
what we should focus on when looking looking into the into the data. And uh in Norway there is a an excellent portal which is called Yinsen where we can look for any kind of the public document that have been ever communicated to or from the governmental agencies. And it [snorts] is a live database and you can query most of the um information from there uh just uh as easy as possible. Um you can go to this website you should sign up if you want to get this kind of info but uh it is pretty straightforward. Uh I have an English version. It's of course available available in the region. And um I started exploring and finally I made uh
a request uh to get data breach reports from the private companies uh submitted to datpa from January 2025 and I made this request somewhere in July this this year just to celebrate one year of crowd strike um event and um this is how it looks like. These are the reports and uh then I got uh what happened next? I got a response from datet and uh they told uh we received more than 400 of reports from you. Um and it will take quite a lot of uh time to process them. uh because they do this processing apparently manually wiping checking that they the reports don't contain any sensitive data like health data or anything like that and they will have to
process it manually. Um and we had a very nice uh conversation with them over email and uh then finally we had a meeting and they explained how they work with these reports, how they collect statistics and I got quite a lot of interesting insights that are not typically going to any of the public reports for that and that was extremely helpful to prepare this um presentation and the research that I've been working on. So um a quick disclaimer everything that we are going to discuss today is public so everyone can go to the website download read uh enjoy. Um but I and I was tempted to put real company names into the presentation. Uh but I didn't
do that let's say for ethical reasons because I haven't discussed it with them in advance. So uh if you would Google or if you would uh go to um this portal you will find relevant information. It is as as I said all public uh but for like uh ethical reasons uh I will not mention them uh easy to recover but uh as I said I would I feel that they may want to speak up on these topics as well. Uh a lot of in the media so everything here is public not classified not confidential no no no no secrets. This is what companies disclosed on their own will what was happening right so uh then
I got a more meaningful response and in the next month they were sending me all the reports that I have requested and uh I've got 500 responses and uh more than 300 reports with kind of the standard u format that ex what were exactly what I wanted um and in the cases is that there was some sensitive data or they considered that they can't share that. Um I just got a formal email that uh formal response that we can't share this information because of the special exception the law that doesn't allow us to do that. So I had uh 319 usable reports and just for those who have never seen them um these reports look
like this. So it's a PDF file. Um it has a standard format. Uh it has a case number in data still and uh the name of the company that reported uh the type of the message. What was uh the problem in this case and it goes all down the page. We'll look into them and uh the next problem was to process these reports and to see how to to work with them. So the quality of the data itself was quite average because u many fields and many like uh uh categories of or choices that they had the uh data pro uh controllers have to submit they are kind of voluntary. So I had to parse them both
automatically with a AI script and manually just to double check. So it was quite a bit of work to parse this data but it was worth even reading through this because as working with archives it is always exciting because you feel that every report has a story it has either a personal story either an organizational story and probably uh some someone can even write a book on that I'm not a writer but there is plenty of materials to write a book um so From these 300ish reports that I've got, I've excluded the most boring ones. And uh these are two typical examples of what is I consider boring. So it's uh someone was sending a mass
email uh to customers for example and they put all customers to two or CC field instead of BCC. So all the other customers got the full list list of the recipients and maybe their names surnames whatever was in the two field or another data or data subject was used to deliver the data wrong and so on so on so these are non non-technical purely human error related lots of cases like that uh much more come from the public sector I took some of them but from the private companies it is typically banks insurance companies uh some health private healthcare providers and so and so on so on. So that's not interesting excluding them. And um then the final report was once
the final filter was uh private sector everything submitted or handled in data in 2025 and uh no non only non-human technical and any other issues. Uh 93 unique incidents reported by 85 companies. Some of the companies reported even twice, three times and so on. they were quite unlucky in 2025. So that's uh that's the research scope and uh I could go further or deeper in the research in the scope but it was also manual work for the data. So they were not able to process it quite fast but the history is available like at least for five six years online. So uh if they wouldn't have to do it manually um I would say it would be quite a lot
of information useful information collected through the through the years. Now um let's look into the numbers. So when I got through the results, process them and got into some kind of a curable database, I decided to look what what's there and uh from all amount of numbers that I've got, I will highlight only some of them and if you're interested in more details, we can have a conversation afterwards. Maybe have submitted something on behalf of your company and you would like to compare or analyze or look into more details. You you will you will you will know, right? Um, one of the first things that I was interested in is incident incident timings and uh, we see that half of the
incidents were resolved within a day and the median for the remaining was 32 days and from my perspective it is quite insane. Even if we say that the data is ever of average quality, it is not accurate. Um typically and the personal data breach or kind of the incident is noticed within a week after it has happened. And if we look for example at the earlier presentation from seaman today the very first one there was a slide with a timeline. I wanted to reproduce it but I didn't have a source there but it was the timeline of the parliament breach. It was two weeks from 17th of February to beginning of March that the Chinese hackers were sitting in the
systems. And now it's uh at least 7 days and 32 days at average before uh we get signals that something is wrong within the system. From my perspective, it is quite long and it's not a remediation time. It's a time from the incident has happened and to the incident it was detected and the mitigation plan started. Sometimes it's a kind of even even even longer. Um this is the observation number one that I made and I think that we all have if we are not fitting the first category within a day we have quite a lot of work to do here. The second finding that I wanted to talk about is a risk assessment. Um there is
a legal requirement to perform a risk assessment on any data privacy breach and uh only 20% of reports explicitly mentioned that they have performed that and from what I see on the counter measures taken um I would say that um the risk assessment wasn't performed at all and the question is how do we create a proper mitigation plan if we don't know what's exactly the risk and how we can measure it and how efficient the measures are. And the third one uh 30% of the incidents are directly related to a third party service provider or cloud service. And in case of the data breach of one for example service provider or infrastructure or application provider,
we can see for example all banks reporting the similar incidents because they have used the same data platform that got exposed to the security attack. And this creates uh a dependence. And we know that if these five banks or the bank that is equally spread across all the regions uh reports data breach, it will likely affect all other organizations that using that using the same solution that are using the same platform and this is a kind of a multiplier for a sec negative security effect that we have. One more is that I was kind of surprised to see but we have more and more alarming signals is physical security. Um we live in a quite a safe country and uh
sec physical security incidents are not that common and the statistics kind of shows that but 5% uh it is from my perspective it's kind of already alarming and it all kinds of the equipment theft or data theft through the equipment theft um And the more surprising is that four or five breaches that happened here uh they used the same attack they were vulnerable to the same attack method the same attack uh vector. Um do you have any ideas what it was?
um four or five breaches happened because the fire exit was not closed from outside and I was kind of and I started reading the the first one and the second and the third and oh the fourth and that was amazing. So [snorts] safety first and fire safety of course is important but this is that almost everyone who have suffered they failed. So that was a kind of a bit fun part of that. Um let's go a bit deeper and look into the reasons for for that and look what were the reasons of all these attacks. will put physical ones aside and look at the root causes of many of them. Um so if we sum up all the issues uh
we'll get our usual suspects. So nothing new here. Uh we talk about complicated or quite novel attack techniques. We talk about how we use AI in daily life and how AI impacts security and how security impacts AI. Um but still the root causes this year here around us they were about fishing and cred credential theft all kinds of misconfigurations ransomware and software bugs. So nothing surprising and depending on how you count um the first at least the first two they're related to human factor and uh the human factor is the most frequent problem and what were the counter measures for human factor attacks.
um very common. We fixed uh access control. We wrote new documents. We trained our employees. And uh if we look at the most recent research, it really doesn't work because the systems become much more much more complex and uh uh training efficiency in the last 10 years according to many many sources it declines. And you can see that training is not enough. You need to use automation. you need to use proactive monitoring and all other kinds of automated software or programdriven controls in order to get uh the problem resolved or in order to get the risk covered. But we still use these things and they are more reactive except training which is not efficient rather
than proactive. And most of the companies who submitted their remediation plan were thinking oh we'll train the employees it it will be fine and then they have another incident in 3 months they report this incident say we'll have trained in place and it will be even better but it doesn't work we see by the series of the incidents that happened. um most of the companies that uh submitted this kind of answers they didn't have risk assessment they didn't have risk management and uh they kind of explicitly said that we just thought what is a best practice so this kind of a standard way of thinking it doesn't help us a lot and it doesn't really mitigate the issues
um we can say that if we to sum up on the practical side risk analysis automation and uh more transparency in maintaining our customer or user trust. These are the core things because once we explicitly tell what has happened and once we give some kind of a proper public explanation of breaches and the measures taken, we kind of expect uh that um our road map, our planning will be tracked and it will be reviewed by more wider audience than just some kind of the measures that we implement internally and don't disclose. This is a summary. uh each of the cases that we look can look into will have a bit more details about what kind of the APIs were
breached, what kind of security misconfigurations happened. Everything is publicly available with a sheer load of details in these reports and uh it only emphasizes that our human effort and our regular knowledge and awareness probably is not enough to mitigate the increasing complexity and increasing uh level of threat that we have while reading from these reports. Um there were some findings that I wanted to share as examples. Um they were kind of outstanding examples um from from these reports. This is a bit more from from the example and there were some hopeless people who were writing that the servers someone came into our servers and just encrypted them uh as a main reason of what has happened which
was kind of true but it was kept an obvious saying. Yeah. and uh they were hopeless to get some kind of solution later. Um many things like that. Um but I wanted to give some kind of the understanding of the scale of the problems while we are on one side uh fighting the Chinese a what happens now once we get outside of this building. So uh the first case this is a parking system and uh from March 2023 till December 2024 it was possible to enumerate all receipts for parking to see who when how was parked uh in the whole area. I think it was was in the west of the country and they haven't even noticed that and
there were some ongoing attempts. So they noticed when someone started exporting it at scale. Um the problem was with the API that didn't simply check the referer or it didn't check uh who was uh kind of requesting this data and there's no authentication. So the reasons are very simple. We don't need to go that anyone dealing with OASP top 10 on um regular basis can just identify what what what of the vulnerabilities is here and how to mitigate it most efficiently. Um yeah, they say that there was only one per person who was uh impacted by the incident, but they below they say that it was like three or 4 thousand different uh requests to get this data.
Um the next one uh yeah so there is another organization and uh hackers got control over the account of the general manager of the company and uh he just paid or he he the through this account they managed to pay 70k euro to some unknown account and they couldn't get this money back ever. Yeah. Um it's kind of less than the general things at scale, but it was a kind of a small organization. It was a pretty good money. I think it's a maybe a salary of a security engineer for quite a bit of time. So um that's that's the case. Uh the reason was that they didn't have two factor common problems. So nothing nothing
really new here. And um that was a very severe outcome. Um actually through many of the reports and I was checking about uh how let's say the what is the cost of the damage uh and uh it was quite common through several reports that the number around 1 million 1.2 million croners was common per day. uh it was a daily loss or incident loss as the most kind of a uh common number that that met. So if an average organization not big scale not a disaster there it gets affected with a regular incident like this the potential impact in like 60 70% will be around 1 million and one more uh so a company reported
that they hired a security consultancy to screen their candidates for uh say position and uh and do some other uh and uh they managed to leak all the CVs publicly online with all personal details of people who applied and their seller expectations and everything. So that was also and they also ask this company to remediate. So yeah, that was that was a disaster, right? Um that was fun part which was actually not very fun because these are simple problems. They are around us. They happen almost every day and uh we would expect that we know how to fix them but it looks like generally it uh it is much more wide and much more common than we would love it to be.
Um have you seen this building in Osla? Yep. So it's in the northeast of of the city and this is an uh art object. Um in Norwegian uh this word means trust and uh the concept of trust is quite important in everything that we do in security and every time when I drive uh on the ring three near this building I think about security I think about trust and it's a good reminder on how we use trust in in our daily life. Um we'll come back to this uh statement or to this art object a bit later but I wanted to say that okay we've looked into these examples and how do they really match
the global landscape where we are compare compared to the whole world where we are compared to the industry and here we can use the most popular source for that it's a Verizon data breach investigation report the most recent of that came into 2025 this year in spring I think and it has a comparison between 2022 and 2025. So these are the most typical types of attacks uh year ago and now. And what we see on the left side 2024 uh is very close to the numbers that we have from the data to uh breach reports that they have analyzed. And this is the US or global ba based uh ranking of the intrusion. And we see
that these things direct intrusion is growing, social engineering is declining, direct web attacks are growing and we can say that we are facing more problems. So we are now on one/ird on the left side and it looks like if the things will go this way we are current this year we are witnessing the the increase of the direct attacks. Um what does it mean that protecting against human factor with human factor doesn't work anymore? we can't protect from dire errors with what kind of training and other things. The second means that we still have a lot of trust to our providers but our risk feeling, our risk estimates or our perception of risk is lower than
the reality. And it means that we are kind of hoping for the best and relying a lot of the trust as we got used to which is generally good for the society but is probably not that good in the cyber space. And uh global industry trends show that we should be more critical to our feeling of trust in security and look into more alarming signals and take them in consideration more often than we typically would like to and this kind of I would say less trust to what we receive as a solution for from security vendors from IT vendors. um can be also transformed or delivered to our clients as well. our clients trust us and in order our
customers or our users whatever we do um and uh in the way we maintain our trust with them we should take probably more responsibility when transferring this trust to others and being more conscious that our partners that probably don't do really well with the security maintenance
I should say thank you to the that data team Christina Stanbrew who helped me quite a lot with this report. And if you have any questions, we still have some time left. The QR code contains a link to the web website with the presentation. So if you'd like to revise it later, you can download it. And thank you so much for your patience. >> [applause]
>> All right. Thank you, Michael. Does anybody have any questions about uh what Michael was able to tease from these requested documents from Dr. Tilcina on Norwegian breach reports?
It's good to have a presentation after lunch because people are very happy. >> Yeah. [laughter] >> Or more happy than before lunch. >> Uh I have a question. Uh you said that there was five physical uh vulnerabilities or incidents. Uh what was the last one? The last one was uh this uh theft of the laptop from the public space from the cafe when the person just went for coffee, grabbed the coffee and someone came in uh took the backpack with a laptop and just walked away. >> Anybody else questions? >> One more or two more from the side. >> Excuse me. Next, keep your hands up. >> Uh, in terms of the incident that had
all of the parking receipts, >> they made the claim that due to the amount of receipts, it would be impossible to infer any kind of personal information. However, today with the ability of using LLM to sort of like analyze a lot of data at once and sort of get that information out, how do we um make companies understand that it doesn't matter if there's a lot of data? uh these days you can still sort of specifically try to exfiltrate personal data if you want. >> I think it it comes an excellent question because I have not talked about AI at all or I mentioned that I used a automation to process report but I intentionally didn't say AI at all uh
which is uh kind of u rare nowadays but then thank you for the question. So I think that uh we know that AI is extremely popular in personal use and as a organizational use it is not that successful although there are a lot of projects. So corporate solutions on AI are not always successful yet. So people think about AI as a p personal like area mo mostly. So uh and I think that there will be still a period of transition when uh those who think corporate they will understand okay this is not the personal issue at all. uh it will be also a companywide problem once they see really good results from AI tools that
uh really work and they can be used for example for penetration testing for source code analysis and other things and then people will have more and more use cases where can understand that this is a real threat for the data and they will they will act accordingly. Um there was a quite recent publication on uh new AI threat landscape and what kind of threats we should consider similar to O waspon but it came like from the W3C I think organiz or EF subcommittee couple of days ago and there are quite a lot of hints on how to deal with this and uh also I was uh talking another day about uh security tool that we use uh once to scan
open-source project that we developed at the University of Oslo and uh the result was so impressive that we kind of still hesitant if we should [laughter] share more information but it it allowed to find the so many issues in the legacy codebase that we wouldn't find ever without and real real valid critical issues in the codebase that we wouldn't ever find with any kind of the nonI based review. So and the team is now fully convinced that they should treat it as a important vector for assessment and for analysis. There was one more question there I believe. >> Yeah, over here somewhere coming. >> Um thank you for the talk. So um we are actually looking into a lot of AI
security safety especially around the data governance. So um build on top of the previous uh so question is similar thing if uh data leakage actually happens so data being extrailtrated from the AI system what can other let's say for example PI or PHI data has already been leaked what can organizing organization do >> um from a legal perspective they have to pay the fine probably or there will be an investig ation that will kind of get into the core reasons the root cause of the incident uh from the public perception. So I've seen many let's say internal incident response plans where they say okay uh we get notification about the incident we investigate we implement counter measures we report
within 72 hours to the data or any other DPA in any other country. If the incident is critical, we should notify data subjects. But in very rare cases, the company says to the customers or the organization says, "Sorry, we have a breach. We failed." And um then and if you look at many like incidents of publicly related data breaching disregarding were they AI related or not um the companies fail on public response which impacts badly the trust that they could have impacts the user base and and so on and so on. So typically uh if it is a B2B relationship informing the key partners maybe should be somewhere quite on the top of the list so that everyone could prepare and
if they are affected they could do this. Based on this I have an excellent example um it is also in this reports and it is public. There was a a breach in u Oslo taxi uh provider uh a year ago. Um yeah maybe no not not a year this this this spring and they had some connecting systems with V and UR and uh in these cases they and this is in this report they managed to notify promptly their key partners to ensure that other parties they don't have uh any impact and they can stop the integration before the incident is over. And it was kind of a very rare example of how the incident was handled. And another company on this
side were finishing they had a another incident. Um and uh they submitted a kind of a 100 page PDF of their new security strategy that precisely and with all risk assessment they addressed uh the details of how they mitigate this breach and how they built a proper security management system and it was very balanced with all cost analysis so on. Maybe it should have been an internal document actually, but it was submitted to uh DPA and now it's public. So if you're looking for uh inspiration of how to write a security program, I think you can find an excellent example there. >> Great. Any more questions for Michael? Oh, keep your hand up. I'm coming.
That didn't hurt at all. >> To also build on that, I think for many in this room, challenge is also to get budgets to proactively mitigate risk such as those you investigated. Could you perhaps comment on the consequences? Did you look into that as well like find remediation reputational costs or yeah just your perspectives there it's a recurring topics topic and uh if you let's say follow the quite recent conversation on LinkedIn on on on on this topic it's in the USbased community mainly but it's like spread quite a lot after that um some Some say that it is easier to or more costefficient to pay a fine rather than implement any security strategy preventively. And most or not
most many companies are kind of inherently prepared to fail when they know that the cost of the incident is not that much which is from the business side quite a maybe a viable strategy. from my security engineering point of view or like security like expert point of view it might be not sounds as as good as it should but uh I would say prevention is expensive and uh sometimes the company say oh would just pay the fine we'll get uh into the position to defend that we are not guilty in some cases we will pass it to lawyers and uh they we get get away from this on the with a lower investment or overall cost on that. Yeah, [snorts]
I'm not sure if I answered that your question, but I think yeah, that's that's a reality. >> All right, thank you very much, Michael Marovich. >> Thank you. >> Give it up everybody. >> Thank you. [applause] >> Yeah, thank you. All right, without delay, I would like you all to give a warm welcome to Tom Barnia. [applause]
Good.
Check.
I think we're fine.
Okay. So, usually in these type of conferences in the first three minutes the speakers trying to spend some time and to talk about themselves or give a very long introduction. Instead of that I will tell you an 100% not related story to my talk of one of the most surreal moments of my life. in the military. I was a cyber security commander of the cyber security course of the IDF and the course was very intense. Uh it's not like a nice computer club. The the trainee studied for days at nights in front of screens solving impossible problems and the atmosphere was a real military discipline and we were all the team all the the commanders were pretty strict.
No smiles, no small talk. We had some distance. And going forward to the end of the course, we thought about what can we do to to melt the ice a bit, to make things a bit nicer, to show them we are human. And one of my team members thought about an idea to talk with all the parents of the trainees and collect pictures from their childhood, like cute pictures, someone that had a costume of Spider-Man or eating ice cream or something like that. And so we did. A week goes by. We entered all the stuff to the class. Uh it was quite packed and they were pretty nervous to to see all of us in in the
class entering and then we start to show the presentation one picture by another and indeed it was some very cute pictures and funny pictures and all of them were laughing and pointing to each other and we had one one trainee who who grew up in Russia and we expected because she didn't grow up in Israel to see some different pictures from from her childhood, but we still expect to be like a cute picture eating a sandwich or something. And then we clicked next and you can see a cute girl in a yellow dress with pigtails smiling, a big smile. Next to that, I swear to you guys, this is a real picture. No nano banana AI.
It was pretty insane. We can say distance haven't been there anymore. Uh going to forward to my talk, I'm talking to talk about trust issues. How jenzi attackers uh exploiting without without exploit hack without exploits using only trust. I'm going to show you a different type of attackers, a different approach of how attacks have been made in the last two years. This is something I've did I've done research about in the last two years. My name is Tom. I work at Veronis. If someone is familiar with the company, I'm doing a for I'm forensic specialist doing forensic investigation for all of our clients. Most of the big names you know are our clients. And I
want you all to have a question in your mind during this talk. And this question might be a bit confusing, might be a bit different from what you're familiar with. But I want you all to think about your organization and if you were the attacker, how would you exploit your own organizational trust? What does it mean to exploit your organization trust? This is the talk you will understand in a minute. Uh and all this talk will be uh delivered as a playbook from the attacker perspective to you chapter by chapter. How can you exploit trust and and hack humans? So I think before diving into the examples and all the stories and all the incidents we
investigated in Veronis in the last two years, we should start with explain with explaining who are Gen Z attackers because I'm saying that like it's an obvious term uh and it doesn't really is. So I think if you're thinking about Gen Z, I'm 24. I guess there are some people here in my age, some people can be my parents here, but all of you are familiar with Gen Z. And there are some criterias of of Gen Z that are really relevant to Gen Z attackers and obviously they are young ages somewhere between 15 to 30 uh creative untrained they are not necessarily technical but they are curious they want to explore new things they want to understand new
technologies and obviously they live with their parents so they the same the same laptop that they're using for gaming h it's the same laptop they hack with from their bedroom. And I think one of the most interesting things that people say about Gen Z is laziness that people are in Gen Z are lazy. You can say that if you want don't want to work hard and still get your gain, you're lazy. I will call it efficient. Uh this is another perspective and I think Jenzi attackers understand that simplicity is the new sophistication. Okay, you don't need to work really hard and you don't want to work really hard to get high impact. And if you think about the
gamer's approach, they want to win. And at some point in in gaming, the game ends. You just exit the game. So from interviews we we read about arrests of these type of attackers, they don't really understand the legal consequences. They don't really understand that it's not a game. And there are some things really interesting about that because the driver that we know about the the motivation of these attackers is not money is not like a geopolitic agenda. It's mainly ego. They want to show off on this code. Hey guys, I got this data. I got this money. And this is really interesting. And I think the main thing here that you need to understand that hacking humans, it's much easier.
manipulate humans. It's much easier than trying to manipulate system or or endpoints. Maybe it wasn't the same thing like a few years ago, but now that every sec every every organization have so many security tools, you can work really hard or on a very sophisticated malware, but the chance you will get the option to get to the to the server to the endpoint to install the malware and to run it, it's not that high for a a matured company. And this is a nice idea. And you might think to yourself, okay, this is a nice idea, but it sounds a bit like a So, not that. I think this is a this guy, he's 22 years
old. He's a British guy. He was arrested a year ago in Spain. And he's the leader of one of the biggest transformer groups of of our time, Scattered Spiders, that I guess you all heard of. And he's not alone. There were some additional arrests of people in ages between 17 to 20 and he's the leader of the group 22 years old guy and all of them were English native speakers. They weren't from China, Iran, Russia, countries that we usually searching for when we do intelligence is from the UK. And if you will ask the Europole, 69% of European teens are committed any kind of cyber crime, which means you should go home and talk with your
children because there is a chance they are potential criminals. And if you ask the FBI, the FBI will say that the average age for any type of crime will be 37, but for cybercom it will be 19, the average age, which means that is getting real, guys. This is a real thing and we should care about that. And I think if we are going to the to the book itself to the playbook before we are diving in to the to the chap to the chapters the attacker start with mapping the area and mapping the area means I want to map where are the trusted and untrusted zone of any organization. Okay. I want to understand
were you already skeptical and I want to I don't want to mess with that because I know I have lower chance to succeed there comparing to the areas that I have that you are 100% certain that nothing bad going to happen in these areas and the Android zone is things that we are all were aware of about external sender using fishy website on the on the browser or any analog software on the internet and we talk about trusted zone talking about task management apps or apps that we usually use on the organization on daily basis or an email from my colleague or any communication channel or one drive that we use uh in in the organization itself inside the
internal organization. So remember that map and we're going to the first chapter speak the language and the first thing the attacker will want will want to do is to speak the organization language and I want to tell you a story and I want you all to imagine that you are part of the story and this is a real estate we had a couple of months ago and [snorts] imagine that 7:30 a.m. in the morning, you just woke up and you are not a tech guy. You're just from finance or HR or something like that and you start receiving thousands of thousands of emails like 2,000 3,000 emails in two hours. Spam emails. Thank you for that.
Thank you for that. So you you are getting crazy like what's going on? I just woke up and it's Monday. H not the best day of the week. So who you going to call? Your IT guy. Your IT guy will help you. This is Bill. He's a great IT guy. He had some way to do but uh he will do it. He did everything you expected from him to do like he reset the password to revoke the session. He even created a very strict policy on Mcast. He give attention to all the untrusted areas. But that's not enough because a few moments later uh the user spoiler the the victim receiving a phone call on his
phone. uh on teams and he's saying that the name of the of the one calling him is help desk help desk manager. So answering the phone and saying hey I'm in a meeting I will call you back in a few in a few moments. Spoiler it was the threat actor and then he trying to call back to this number of the help desk because he he submitted the ticket and he doesn't answer and then he trying again and he doesn't answer. So think about the situation. The situation right now is that the victim is trying to call to the threat actor a few times and he doesn't answer. He want him to give him control to his computer to do whatever
he wants without any technical steps here. And this is a real screenshot of the team's chat between them and I hope you can all see it. You can see have desk manager and as a good IT guy the the threat actor sending you a message working on another request. I will call you back. And then the victim saying to the threat actor, okay, thank you. we will talk later and then they have a zoom session with remote control for something like 20 minutes on another case was 40 minutes and you can imagine what they did I won't I'm not going into that but everything you can imagine is quite true uh so this is what one of example of not
having any technical steps here not any technical sophistication and still getting the gain still getting what he wanted And of course not you fall for for that the other employees in the organization not you you are all great. Uh and I think talking about the next chapter talking about the next thing is about native sharing. Native sharing I mean by that every uh default mechanism of sharing in apps from trusted user. For example, if you open your one drive and you do right click, you will see the sharing option, right? And then you will receive an email notification. Hey Tom, your colleague shared the file with you. Please enter to this file. So if we
receive kind of this sharing from the apps we already use and this is all legit. So so all good and it's not something that no one warned about warned us about. You don't need it's not like an external sender. It's not a fishy link, just native sharing. And the email will look something like something like that. This is a real email, notifications, email from one drive, a normal email that someone shared a file with uh his colleague. And now you put your blue glasses, your defender glasses. And this is what the user received. And then he opened this one. One note, by the way, it's a great choice of of file to share. I'm not
getting into that, but it's a great choice. We can talk about afterwards. And this was the file. pretty normal file that they share the organization. Obviously, there was a link there, but we are not getting into the to the link itself, but it was fishy. And [snorts] then in Veronis, we saw something pretty interesting. We invest when we investigated the case. The case was if we put now the red glasses is that the attacker uh had access to of to one of the accounts and instead of starting stealing data instead of starting doing some manipulations there he just created this file in the organizational one drive. This is the only thing he did. So we saw in Veronis that he created this
notebook uh in in the one drive of the organization. And the next thing he did is to share this notebook with like 400 people. And how what was our detection clue? We search in Veronis like the events of folder sh link created for the last 90 days. And we saw there were like 412 events for the last 90 days for this user and 411 of them were on the exact same day for the exact same file that were shared. So think about if you receive a file that is existing in your one drive of the organization from your colleague there is no reason to be uh to to suspect that nothing bad is going on.
You will just enter the file. And this is another great example of how hacking humans is easier and how trust can be exploited without any technical expertise. And I think kind of the the the last chapter from the attacker side is is border tools. And there is a very popular tool among attackers right now. Uh guys from the red side might know this tool. It has all these amazing capabilities that every attacker just can dream of. Does anyone familiar with this tool? No. This tool is a terra. You might know this IT management tool or you might know one of the others are quite the same. And those capabilities came straight from their website to to what I
show you. And attackers really like to use these kind of tools specifically Jenz the attackers because again it's easy. You don't need to work hard. They don't need to have a lot of efforts and and the hump impact is really really high. So all RMM tools that we investigated first of all it's already there when they have access to one of the laptops one of the devices they just chose okay do they have team viewer do they have any desk they don't need to work hard and it looks like been an activity because if you use in the organization with team viewer or any desk or anything like that no one will notice that there
was an abuse of this tool uh during this uh during this attack and obviously it includes All you need is an attacker. Another interesting example we saw and this was a really interesting one is this app is ClickUp. Uh it's like Trello or Jira. And what happened here is the attackers created a user by the name of the CEO of a company. They knew that this company is using ClickUp. It's not a secret. They just searched in the job descriptions of the company and they created an account by the name of the CEO and they just sent invites to all the seale employees and when you receive a direct invite from your CEO with his name and you know
you use ClickUp you drop everything you're doing just enter to the dashboard. So they clicked on the dashboard on the invite it led them to their organizational ClickUp because they are already using the app. It was just another dashboard and there was obviously only one file on this dashboard and obviously this file wasn't uh like fairies and magic. Uh it was a a malicious file and you know you see all your friends there all your colleagues it's your CEO it's on your trusted platform that you usually use and trust was very easily exploited here without any technical barriers. So I think after all these examples uh you might ask yourself okay my security tools my uh my stack is pretty vari but
all traditional defenses can't really handle genz attackers and exploiting trust and and and why exactly is that even that if there are some cisos in the onions you know that you pay millions for the for the tools I think first of all trust gets you there much faster and and smoothly comparing to to exploits of a CVE or comparing to exploit of using some uh malware uh from this kind of or another trust gets in much faster. Second thing is that we put a lot of efforts in in creating rules in our EDR or patching every day and there are some CESOS that saying okay you patch today the patches you need but we don't pay attention that
trust can bypass all this EDR patching it doesn't really matter and the last thing and I think Michael talked about this a bit is kind of the the let's say illness that CISOs have around a security illusion and what do I mean by that that CISOs buy a lot of security platforms for every niche thing and they think okay I have so many security platforms I'm protected I don't need to work hard anymore I spent millions on that it's not really true as you know because trust is not part of this uh uh situation and I think we while we try to play checkers to protect our assets against threat actor and protect our walls they are playing among us with us
and trying just to be part of the the organization so we not won't notice them and still creating the damage. So very shortly uh what you can do. So few things I think behavioral detection is one of the most important things because it doesn't enough that you identify there was a use of virus or they use of illegitimate uh software but you need to understand what is abnormal for a user for user comparing to his colleagues comparing to the organization in the last 90 30 days. Second thing is user awareness and it's not just user awareness for for fishing. There are some training that saying oh you identify that this is a zero and not and
oh you are Einstein. No guys it's not working anymore. We need to talk about user wellness around context about credibility and about creating good habits. And the last thing is that what we started with rethinking trust boundaries and about uh all of our apps. And if we try to put in a nice phrase, it's patch people, patch processes, and patch assumptions and patch assumptions again. Jenzi attackers show us it's not about exploits, it's about credibility. And I want to uh circle back to the question we started with after I showed you all the examples and all the stories like if you were the attacker and you think about your organization, how would you exploit your own organizational trust?
Thank you very much, guys. This is a QR for my LinkedIn. Thank you. [applause] >> Thank you very much, Tom. Thank you. >> Are you around for the rest of the day? >> Yeah, I'm here. Totally here. >> So, if you're worried about your kids, come talk to Tom.
And up next we have returning speakers to besides very happy to introduce Emlin Butterfield and Veronica Schmidt. They work at Nordov University as recctor and assistant professor in cyber security and they are also um organizers of besides Kristan which I had the opportunity to join in on uh a few months ago. So, if you enjoy this event and would like a reason to go down to Christensen, you should uh it's happening again next year, right? Great. All right, we'll see you there. Please give a very warm welcome to Veronica Eshmmet and Emilyn Butterfield.
Is this on? Okay, before we start, I think we should give a hand of applause to the Oslo team. So, thank you very much for doing yet another kickass event. So, let's give them a hand. [applause]
Awesome. So, we are not going to try and keep you here past break like because I think chaos is going to break out. So obviously I'm here talking with Emlan on one of my favorite topics which is logging. Um and we're going to kind of go from different perspectives. I'm going to play the role of the attacker. And Emilyn's going to be the good guy. Kind of sums this up, doesn't it? One bad, one good, one weird, one normal. Uh so we are going to try and frame logging. Now, this is part of my PhD research that I'm sharing with you here today because like I said, you say bad logs three times and I will appear in
your browser. It's happened before. Want to introduce yourself? >> We have a whole slide for introductions. >> Worked very hard on these slides. >> Uh Emily Woodfield, director for Norf University College. Uh I my area of specialism research interest enjoyment. to lots of things I suppose is digital forensics uh in particular uh application analysis inside of mobile devices but I also teach across a range of cyber security and digital forensics for a long period of time. >> I'm Veronica Schmidt but you might know me as V. Short and sweet. I don't like long names and only my mother calls me Veronica. Normally signals I'm in trouble. I'm currently the program leader for the DFIR degree at Norf
University College but originally from South Africa. Um, I also am very passionate with all kinds of events right around the world. Um, the community is very important to me, but I've also done incident response for 16 years and never met a log that I liked. So, I decided to do some research and find out why there's never the information that I need. I think if you've done incident response, you've probably said, I wish I had that in a log file. I mean, if I have to ask you all, it's kind of come up. So, I kind of asked the question, I know how to break in. I know how to defend. I know how to
investigate. But why aren't we building what we need? >> I'm just here to click the button. So, I'm not going to speak. >> No, some too short to reach the standard. That's all. >> It's it's No, that's a true fact. And I'm okay with the fact that I'm short. I don't have very fall, you know, far to fall. So, great stuff. Um, logs are important. If you've done incident response, you know that you are going to go look at the log file to try and determine what's gone wrong. If I ask you, yeah, how frequently do you review your logs? Only when it breaks. Yeah. Uh, do you go review it for vulnerabilities? Nope. You're a special special sausage that is
because not everyone does that. And yet, before you fight me, I have the statistics to show it. Um, the key thing here is we want logs. We want good logs, but they don't exist. And I'll tell you why. As part of my PhD, I did review on forensic readiness. There's over 30 models. There's a model for everything called the forensic readiness model. What's the key thing they have in common? Good logs. But we've established those don't exist in a lot of organizations. Which means whilst we have these great forensic readiness models that paint a picture that we are going to be super at doing incident response, we have a missing key component. And that's what
my research is about. Establishing what is a forensic ready log. And we're going to share some of that insights with you today. I've been told I'm not allowed to say a bad word. So I'm going to go with the mom thing. You know, RTFM, read the freaking manual. I always say read the freaking log because a lot of time we think we're fine log forj hits and you go look at your logs and you realize you're screwed. Go read your logs. They tell a story you're not aware you're logging. So the key thing you need to take away is this slide. Go back and do a log analysis. Go determine what is in your logs because I can tell you
you've got a lot more than you think you do >> or a lot less. depends how you see it. >> The interesting thing that has come out of my research is the diverse perspectives of everyone looking at the same log file. You have developers, they see logs for the purpose of debugging. They want to be able to look at it and determine what is broken. >> We we practice this really well. >> He's a little slow today. >> I am. Sorry, I was reading. Uh from a forensic perspective, the log might be the only thing that we actually have to to prove or disprove or to give us the story behind what's gone on. Well, ideally, what we want is the
physical device, but we know how easy it is to destroy evidence, to get rid of evidence, but to get rid of all evidence to all stories is less so. However, when the logs are really bad, then that doesn't help anyone. Now, in all the pentesting and device purchasing, if you know me, you know I have a bit of an addiction, one can say, of buying medical devices, hacking them, and contacting the manufacturer. It's a fun little game we play. Um, I go to the logs first. Like I said, it is a treasure trove of information. Uh, if I've done pentest for companies, I will try access your logs first. Your dirty little secrets are on there. You think a
key logger is bad, your logs are equally as bad. But we also know from a legal perspective, from a compliance perspective, logs are important and need to be there. >> But the question is, and this is what's come out in my PhD, which one's right, which one's more important? And they all equally important, right? Because we all want to use a single source for different things. Well, we're not building that capacity in because the developers control the keys of what goes in the logs. Therefore, they are building it for the purpose of debugging, not for the purpose of root cause analysis, for the purpose of compliance or for the purpose of the defenders. They are building it,
however, to the benefit of attackers. And when we look at compliance and we look at regulations, my Norwegian is atrocious. So, I'm just going to say you can read the slide, but in terms of what needs to be there, we have to legally have information available that will allow us to identify if something's gone on, when it's gone on, and who performed it. That needs to be in place. So, part of our security monitoring is that we need to have logs. NIS2 is probably my favorite thing, but it scares the crap out of me as an incident responder because now we are actually required to report in a time frame, right? And if we take the 279
days traditionally we have taken to detect something and equally long to figure out the root cause, we're in trouble. Cyprus has gone differently and said, well, guess what? Uh you don't have seven days, you don't have 24 hours, you have 12 hours to tell me that you have detected a breach, right? And it's only getting tighter. Am I too quiet? I'll shout better. Okay. So there are requirements around having the ability to report these incidents much faster, much quicker, more accurate. Now NIS2 is an interesting thing, right? After containment, you have 30 days to file your report. When we've reached containment, we nowhere near knowing actually what's going on. And how do we do root cause analysis? We
look at logs, right? In an ideal world, we have multiple sources, but the logs become a source of truth. Meaning logs are more important now in our industry than they've ever been. They've kind of been the stepchild, you know, that we don't we have but we don't pay attention to only when we need to. But that's kind of changing now. Veronica doesn't have a stepchild, so there's no concerns in terms of care. >> I'm nice. So, one of the things I wanted to find out, is this my bias? Do I have an issue with developers? Is this me going bad, bad, bad, bad bad, or is this fact? So, my PhD put out a survey, and
let me tell you, if you studying don't do surveys, they're pain in the butt to get people to fill out, especially developers. Well, we did a global survey trying to find out what what are developers thinking? What is the environment about logging? Is it even a problem? And we found some interesting results. Now, this is just like 2% of what we found. One of the thing is a lot of developers do want to log conservatively. Ironically enough, it's because of the cost of storing the logs. It's not so much about the fact that they feel they need less. It's the fact that it costs too much to store this data. But we also see some that just want everything. I
don't know what I might need if I might need that stack trace. I might need that response body. They want everything. Now, when you log everything without a strategy, you have a bunch of data that's unstructured and uncontrolled, meaning you don't know what's going into your logs. And then we have our third category that says, you know, they know they log too much. And this is kind of the cultural trend around logging for developers is they want it all. And some can't have it all because it's too expensive. >> But why is the difference between them all? I think the key difference is that some have a strategy and the strategy is it costs too much to log and the others are
just doing what they think they might need. Now, one might assume we're in an area for GDPR and the US has got their own kind of privacy laws and there's compliance around data storage that that makes it all okay. I was pretty shocked by the fact that 45% of the developers said compliance influences their logging practices only really minor which means there's a disconnect between what we're allowed to have according to privacy but that doesn't apply to logs logs are fine I can have everything in the logs we have only 30% that has made a significant change which means redacted removed or not logged something and then obviously The last two are probably the ones that are most
concerning and those are the saying that I don't know if it's had an impact and the others say it has had no impact. It means there is a disconnect between compliance and logging >> and do you think that will change with the new regulations? >> No, because there's no standard towards how we do logging, what we do with logging and how we strategically develop it. There is best case practices um which is things like OASP and stuff but there's no real standard that we can point to to say this is what you should log this is what you shouldn't log and what is a forensic log look like now vulnerabilities and log was the interesting thing like until log 4j came
about and kicked all of our butts and made us sit up and take notice people would not consider their logs as a vulnerability it's a source of information it's something that makes be stronger not weaker. I then went to research. Do we have common things we know we should not build? So common weakness enumeration all of you probably know what that is. These are known weaknesses we know exist and as far as possible should avoid building into software or devices. And all of these things is what I try to figure out. Do they still exist or are we doing better? Now you can see a whole host of different things that developers selected. But one of the biggest
scariest things for a forensics person is insufficient logging. The majority of our developers indicated that in fact they know they're not sufficiently logging but it's also a CWE. So we know we are doing it wrong. And again more things we don't want to see is authentication information and logs. Um so I think all in this room can agree we have a problem with logs right it's not just me that has done some day drinking and now there is problems with logs but it does exist I got access to the o20 data because I thought maybe it's my bias maybe the data skew I wanted to check and they track similar things to what I tracked
in my survey and the interesting thing here is that we saw that we got better by 2020 and 2021 one especially when we look at the incident rate of these CWEs being about right we see the highest prevalence was improper output neutralization that was happening well we see that everything gets better however the most interesting fact comes here that we see insertion of sensitive information while everything else got better got worse which means there's a correlation between my data from 2024 and this data that we still in 2024 are inserting sensitive information into our logs. >> Now from the forensics perspective, those logs might be the only thing that we have that can help to prove or
disprove even just give us some inkling towards actually occurred. And without correct logging, without the right information inside of the logs, it can be non impossible. And for many of you, that's probably not going to be of interest because you just want to fix the vulnerabilities. You want to develop the systems as opposed to investigate them afterwards. From my perspective, the investigation is the real sexy side. Um, and is an area that everybody should be interested in because it gives you so much. And that jigsaw piece that trying to identify what's going on and who's done what is hard, difficult, just like trying to protect the systems. And you can think of it from the perspective of security
comes in and stops the bleeding of a wound. The forensics comes in to explain actually what's happened, who's done what, and how did that wound appear in the first place. And if we continue with that, the forensics shows who did what, when, and how. That's the ideal from the logs. Now, we already know from looking at the results from the surveys and from just general talking to people that many of the logs don't contain the information that we need as forensic examiners, which means that generally speaking, we can't come in and do attribution. We can't find out who did what, when, what was the pattern of attack, how did people break into the systems, what traversal did they do?
Because without the logs, what we're actually doing is inferring information. And when we infer something just like we as normal people do we will make mistakes we have to interpret it in a certain way and then all comes in our bias and our general views and our understanding of the systems which might all be incorrect and then we become the human AIs and hallucinate in many different directions which can then give us challenges later on. The logs are the witnesses. Sometimes they'll make mistakes, sometimes they'll mislead, but we have to try and interpret. That's the forensic perspective. >> So my perspective and the the kind of the the opposing perspective and you might ask how did I get to looking at
that perspective because it was a kind of journey through doing purple teaming, blue teaming, then wanting to know how the heck did they get in? So I started doing red teaming and I started realizing that that's the missing link. We don't understand the other side. I look at it as log all the things. You give me all the free information. I'm very very very lazy. So if you can pop in a JWT token or an app key and secret, I'd be a happy girl. But it is a treasure trove. And often when I get a device in and you'd assume that there is encryption and it's protected, it is normally in a zip file in plain text and
it's there for the taking as soon as I've got physical access. Um so for me it contains the information I don't have to go scan for. But that doesn't mean that we can log everything or should log everything. We have to be careful. We have to have a strategy towards what that logging is and what should be included in the logs. >> I think the strategy should never be we need more. More is better. It often is not because the more you log the less control you have. And it's a simple thing that long ago when we did the mainframe development, we used to have static and dynamic strings, right? We had a static term that goes into a log
and only a dynamic value that would update. But now we are doing response headers, printer bodies, and we're only having static information going into our logs. And that's where the uncontrol comes in and you don't know what's in your logs anymore. >> And those logs can hold a lot of information. So if I do a quick grip and I do ey, I can tell you that I'm going to find a JWT token. There is not a log that I haven't found that in. Now you might say, it's a JT token, but those things are sometimes valid. Sometimes they're even valid for 75 years because the default has been left on. Um people like me, we're going to use it. We're
going to exploit it and we're going to use it to our benefit. I found app keys and secrets that no one knew existed. If I can find how your hosts work, I know what services you are. I know where you're when you are updating when you are rotating your logs. I know you're normal. I'm going to exploit that and I'm not going to move and behave in an abnormal fashion because that makes it detectable, right? There's a saying in in incident response, know your normal to know your abnormal. Now, for an attacker, I want to know your normal because I'm not going to do the abnormal. And logs tell me how your system operates. So again log is log as if someone is
watching. Now one of the tools that I like a lot and I use this in pen testing and in ocean is truffle hog. Do you guys know truffle hog? Well you should be using it against your logs. There's quite a lot that you can find. Truffle hog is like one of my favorite things to use. Um it is really powerful. So go and find out what the attackers are using and use their tool set against them. because if you know what is in there before they do, you're one step ahead. But truffle hog is a great resource to go find issues in logs. >> Now logs have many many problems uh from whichever perspective you want to look
at from a security or forensics perspective. From the forensic side and even from the security, missing or inaccurate time stamps become a real problem for us as forensic examiners. We want a timeline. We want to see who's done what when and in what structure uh in what process in what flow. If those timestamps are not there then trying to recreate that whole narrative is impossible. If the time stamps are inaccurate or depending on which system you go on the time changes then just a 30-cond difference can change the entire narrative of investigation. And I I think one of the things that that that as an incident responder has been frustrating is the fact that I have
to write pares for every log set in the same system in the same organization. It gets to a point that it's not whether it's JSON, XML or raw data. It's the fact that all three on one log, you know, and there's no set schema. If I know what your schema is, I know how to manipulate the data. But if I'm spending incident response time on having to write a parcel, we've got a problem. Like there's many instances where I've had to look at six different logs and there's been six different schemas within those logs. And when you try and pass data that doesn't have a structure, what happens? You start losing information. So it's very important that
in an organization you have a schema. It's not about the type of log. It's about what fields are in your logs that you are consistent in what you have. Do not make make the pass or give up um whil trying to pass your data. And we understand that is within an organization. It's not that everybody will have the same logs, the same formats. It but as consistent as possible or as few different variations makes life easier. A big challenge that we have is that we can have terabytes and pabytes of information to go through. You might think it's ideal to have more information, but what that does is it creates noise. That noise makes it
really difficult to pull out what is important and what is relevant to what we're trying to investigate. That over logging buries everything deep down in terabytes of data and it extends the investigation. Not only do we have to then develop pauses for schemas. We have to try and track if dates and times are correct. We now need to look at what information is in there that's actually important. >> It's one of those instances that the size of your data lake doesn't matter. It's the clarity of your data. Uh more volume does not make better logging. And this is something that I have found in the 16 years of doing this and responding to various different
industries is the fact that the companies that have pabytes and terabytes has actually got crap in crap out. So it does matter what you have in your logs and what you plan to put in them. Now you might say, well, why does this all matter? Well, it matters because I'm going to use that information to find where it's the easiest to transverse. If I know your normal, I know how your users connect, I know how they transverse, and I know what services you have, I might actually be able to start chaining attacks. So, you might have six low CVES, but when I put those CVEs together in one attack, it does change the complexity and it does make it more
successful. So whilst we think it's only the severe CVEes that matter, if your cluster runs together and you know the system intimately, you can start doing some real damage. So I use your logs against you to behave as your users would behave. This one's probably the one that I get in trouble to the most. It's my soap box. I'm going to climb on it. Integrity of logs. And I'm not talking at rest and I'm not talking in transit because we do those things. We use CLS to send it over a secure protocol so that it's not tampered with. But how many developers and systems have the ability to tell the logs have been tampered with to tell the
logs have been changed? What integrity mechanism have you got over the contents of your data? And this is something that's come out of my PhD is the fact that a lot of the time we can't tell that our logs are actually a very crap data source because they have been changed. often it's in plain text. It's in J JSON XML and it's not protected in terms of clustering or chaining with an integrity mechanism. Now, if you're a history buff or history at all, how were the pyramids made? Do we know precisely how everything was done or do we have to infer and try a