Knock Knock. Race Condition. Who's There?

Name: Knock Knock. Race Condition. Who's There?
Uploaded: 2025-12-15
Duration: 41 min 34 s
Description: Race conditions are logic bugs where timing matters, leading to unintended consequences that attackers exploit. This talk examines race conditions across games, automation platforms, privilege escalation vulnerabilities, and web applications, using real-world examples from major security vendors to

BSides Cape Town · 202541:34913 viewsPublished 2025-12Watch on YouTube ↗

Speakers

Ross Simpson

Tags

CategoryTechnical

TopicVulnerability Research Web AppSec

StyleTalk

Mentioned in this talk

Tools used

Burp Suite

Platforms

n8n

About this talk

Race conditions are logic bugs where timing matters, leading to unintended consequences that attackers exploit. This talk examines race conditions across games, automation platforms, privilege escalation vulnerabilities, and web applications, using real-world examples from major security vendors to teach developers, pentesters, and security professionals how to spot them in code and systems.

Show original YouTube description

Race conditions are everywhere - so why haven't you seen one, and how bad are they really? From cheating computer games, to flaws in big automation platforms, local privilege escalations, and stealing millions from web apps... even a look at a problem with birthdays, and how to freeze time! =================================================== Race conditions are everywhere but most developers don't seem able to spot (or prevent) them. They're a type of logic bug where timing matters, and lead to unintended consequences or effects, that attackers often exploit. They're easy to overlook, tricky to fix, and can be devastating. The goal of this talk is to change the way you see code and systems, and identify race conditions yourself. Once you "get it" you'll start seeing Race Conditions everywhere. This talk is largely inspired by a work colleague who got frustrated by me regularly pointing out race conditions, who's now gone on to excitedly identifying them himself... and how even multi-million dollar "cyber security" platforms we've used make these same mistakes. There should be something to learn for everyone; whether a junior developer, penetration tester, or seasoned cyber security professional. With a dash of probability theory thrown in, showing some leading HTTP research by James Kettle (aka "albinowax"), and a look at games for a bit of fun. ================================================== About Ross Simpson: Hacker, coder, gamer

Show transcript [en]

Hi everyone. Thank you for coming to my talk titled Knockk Knockock race condition. Who's there? My name is Ross. I'm a hacker. I'm a coder. I'm a gamer. I've been in the IT industry for about 20 years. I have my OCP. I spent a lot of my time building microservices, running them on Kubernetes. I do automation and data synchronization and I think like so many people in this room I now do AI stuff because we have to, right? I've been very fortunate to be able to do some talks at local hacking conferences in the past. Um they tend to have a bit of a theme to them often gaming related and today's no different. [snorts] I have a website where I blog

on every few years. So I've got a few articles up there but that's sort of become a bit stagnant. Like so many of us, we're on a whole bunch of social networks these days because we couldn't just have one good one, which means I'm on many and I read and interact with none of them. But you can find me um on those three different sites. Of course, the dead bird being the oldest one. I've been a part of the community for many years uh going back to 2011, the first B sites that we had and I attended and I've been in various roles from co-organizer to volunteer to speaker. I even wrote the scavenger hunt code which

is somehow still running today after a few years. Uh thanks to Moonake for keeping that going and running the scavenger hunt. And uh one of my more proud achievements is writing a little bit of animation code that ran on the flex capacitor badge many years ago for those of you who remember it. [applause] Um and primarily to say that with this slide, I'm here in my private capacity as a member of Bsides. This is like a second home to me. I haven't missed a single Bides in all these years. Uh so I'm here as one of you. That said though, I do have a day job. I work at a company called Integrity 360 who you

might not have heard of, very big in Europe. They have however recently acquired Enclose, who were a very regular sponsor of Bside, so you might be more familiar with that name. And Integrity 360 offers a number of services and sells a bunch of products, which is to say that we have thousands of customer alerts flowing through our various systems and we provide all kinds of services on the back of that. We also interact with a lot of vendors, some of them directly, some of them indirectly through our customers and uh sort of a mutual relationship that way. So we get some pretty good exposure to companies that I otherwise would have no interaction with or no access to.

Problem is we also come across a lot of problems. Certainly for me there's this assumption that these massive figures in the cyber security industry have got everything absolutely nailed down working amazingly and the reality is is just not true. We'll look at some of those a little bit later. So, the reason for this talk, well, people think race conditions are complicated or magical. They're everywhere and they can have horrible impacts and they're really hard to replicate or test. I think I can teach you to spot them. Maybe not fix them, maybe not rearchitect an entire system, but perhaps look at a system or a process or some code and spot something's not quite right. And I think that'll make you a

better whatever you are a dev, a pentester, a QA, maybe even just a product designer, a product manager. I think this is a useful skill. You just need to think about things slightly differently and we'll take a look at that. So what is a race condition? Technically, we should refer to MITER. MITER have a common weakness enumeration list and things get very technical there. These are just the race condition uh related ones that I found. Race conditions mean all sorts of things in all sorts of different contexts, but we're not going to get into that today. We're going to keep things pretty high level. For me, it's more about you being able to spot them and think differently

as opposed to meeting some really academic criteria. I would like to propose that the definition we use for today is two things happening at the same time. Very, very simple, but that basically encapsulates it. Behind the scenes, you're going to find race conditions often happen around asynchronous programming or parallel systems and event and stream processing. The idea is two things happening at the same time, but that's not quite enough. Obviously, two things happen at the same time around us. There are two talks happening right now. So, that's not the full definition, but that's the takeaway. We're going to keep referring back to that phrase. What we really mean is with some kind of unexpected or

negative outcome. That's the point where it starts becoming a bug, a problem, an exploit, or having some kind of impact. The fact that two humans are born at the same time today doesn't really affect anything. It's not really of concern. The way I think I learned to spot race conditions, because there was no real aha moment, goes back a few years to Diablo I 1996. You see, it's an action role playing game with single or multiplayer mode. You start as a pretty low-level hero with a bunch of armor and weapons, and you spend the game completing quests, killing monsters, collecting items, and collecting gold. It looks like this. Really the prototype for the genre, but

very familiar. You've got a character sheet. You've got certain items. You've got inventory off to the right. You've got a bolt in the lowest middle with two health potions in. What does make this game a little more different though is the gold is actually an item in your inventory. Nowadays, it's just sort of a counter that's associated with your character that increases. But in this, it was an actual item that you could drop, you could pick up, and you could steal from other people. Back then, websites looked like this. This was peak website design, and this is where you went to get your cheats. And one day, somebody posted this cheat for Diablo that allowed you to duplicate

items. And this is what it looks like. So, you take the gold out of your inventory and you drop it on the ground. You walk away from it. You hover your mouse over a health potion in your bar, uh, sorry, your belt, and you click on it at the exact same moment when the character is picking up the gold. And something strange happens. The icon stays the health potion that you've just picked up out of your belt, but the description changes to 100 gold. If you drop that health potion on the ground, it looks like gold. It's labeled as gold. And we have 100 gold in our inventory again. and 100 gold on the floor. So, we now have 200 gold. But if

we did it once, can't we just do it again? So, let's take the 200 gold, drop it on the ground. We move away to give ourselves a little bit more time to move our mouse, but we click on it to get the character to pick it up. We put our mouse over the belt. We pick up the health potion, and suddenly we have 200 gold on the ground and 200 gold in our inventory, which is a bit strange, but awfully convenient. And it's kind of interesting how this happens behind the scenes. The the game actually uses the mouse cursor as like a temporary inventory. one slot item and it ends up moving [clears throat] the

item into the mouse cursor, but that's the very thing we're populating when we're clicking on the belt. So, two things are writing to that at the same time. So, they collide. The cursor tends to keep the icon of the thing clicked on, which is the health potion, but the actual data backing it type of thing that it is and its value is the gold. That gets us our 400. So, um, if we were playing this normally, you would make maybe 2,000 gold in 15 minutes, completing quests, selling items, and things like that. But in 15 minutes, you could also do 200,000 gold. And that's what it looks like. And if you're wondering, it's not just gold that you

can duplicate. As I said, your gold is actually an item just like anything. For me, the books are the most valuable item in the game. You can normally only buy one at a time from a vendor. That teaches you a spell forever at a certain level, and the more books of that spell you learn, the more you level up. But to get more than one or two of those is really difficult. So if you buy a single book and you just clone it seven times, you very quickly uh become somewhat overpowered. There is something a little bit amusing here. If you fill your inventory with all that gold on the left, you actually have no space to buy

items with. So turns out there is a thing as being too rich. Not directly related, but this is [clears throat] bides and this is a hackeron. Uh back in Devcon 32, there was a fantastic talk on troll trapping through TAZ tools exposing Diablo cheating. And this wasn't about cheating. in the terms of item duplication. The TAZ is a tool assisted speedrun and there's a whole community around speedrunning games, which is to say playing a game from the start to the end as quickly as possible and people compete to see how quickly they can do this. Someone put out a video beating Diablo 1, setting a new record, and these guys who gave the talk basically

disproved it. There's certain things in the game that do or don't happen at the same time, and they did a whole deconstruction of this video and basically exposed the speedrunner, the fraud, but part of it does touch on item duplication. Um, and it's just a really, really good watch. So if you're into games, if you're into hacking, if you're in speedruns, go give that talk a look. So that was back in the '90s. Um, we know security, it evolves quite fast. Everyone plays catchup and problems get solved. So if we look at Sea of Thieves, which someone may or may not have spoken about 2 years ago at Bides, some 30 years later, I'm sure we wouldn't see

any of the same things happening here. Sea of Thieves is a slightly different game. It's a firsterson shooter like game, but it's a pirate sandbox adventure game. So you're seeing through the character's eyes and you're running around. You're not really concerned with items because everyone has the same items and equipment to explore with. So it kind of comes down to time spent and skill and you sail around fighting skeletons, monsters, and other players collecting and selling treasure which ultimately attaches to your account. So between games, you're building up an overall sort of account balance that you can then use between them. Traps were added to the game in 2024. The game's been out since I think 2018.

Um, and it looks like this. So on the left hand side you have the player's inventory. Those are the items that you are holding or carrying. At the bottom of the screen, although a bit faded out, is a storage crate. Those are items you can find in the game or buy and they, as the name suggests, allow you to store items. On the right hand side is the contents of that storage crate. And the selected item is a trap. It's like a bear trap, a thing you throw on the floor, and then the player steps on it, and they get trapped in place, giving you an advantage. Well, some people found out that if you take the item from

the storage crate and you throw it to the ground, but you very quickly transfer it back to the storage crate, you can get two things to happen at the same time. There's now a storage crate on the ground in the game and a trap in the storage crate on the right. So, two different things happen. The placing of the trap and the storing of the trap, which have different times and started at different times. But because they coordinated both activities like we did in Diablo, both things happen. The trap is placed and the trap is stored. All the player has to do then is go to the trap they placed, pick it up, lands in

their inventory, and move it across. They've now duplicated an in-game item 30 years later after a Diablo blow. So, we have not learned our lessons yet. After traps were added in 2024, February 2025 comes along and spears are added to the game. These are a slightly different type of item. They're more like a treasure. You cannot put them in your inventory. Find them around the world. You can pick them up. You can sell them. can also throw them at players because they do additional damage. On the screen, we see there's three of them. That's the trap. However, if you pick up one of these traps and you go to one of these weird statue things. I know this is a bit

strange. Um, this acts like a kind of bank vault or a teleportation mechanism, you're able to store treasures. Those are items you can't carry yourself because you encounter a lot of them. You store them with this statue and you pick them up elsewhere in the game later. So, it's how you kind of get um items out of this dungeon into where you would sell them. But as I said, you can throw a spear. They do damage to players. So if you throw the spear and you store the spear, two things happen. The throne spear lands back on the ground behind the player. So he's put it back where he found it. But the statue says that you

still have a spear and that also spawns behind him. So he's now created himself an extra spear. Not massive game breaking bug, but definitely an advantage for when you're fighting other players trying to make some gold. Just like before, throwing the spear takes a certain amount of time. Storing it takes a lesser amount of time. Both things happen at the same time. The game basically produces two of them. Oh, still with Sea of Thieves. Uh, this is actually going back a little bit earlier, but this is a different [clears throat] type of thing. They added something called the Burning Blade, and it's this ship that you can find on your map in the game. It's

controlled by the computer itself, armed by skeletons that you go and attack and fight. And at the bottom is the level of the Burning Blade. Uh, the higher it [snorts] is, the more it's worth. We'll get into that in just a second. So, you level up the burning blade after you've taken it over by doing skeleton camps, which consists of doing this constellation puzzle. It's really, really easy, but it's kind of like a time sync, so it just takes a bunch of time to do. You have a few waves of skeletons, and then you get to complete this ritual that kind of wraps it all up and increases increases your level. And you have to sail around the map to these

different camps. So, there sort of a lot of downtime, as it were, delaying how often you can level these up. because it looks so cool. That burning blade which you can sail in the game. This is what the constellation challenge looks like. It's like a magnifying glass thing and you draw shapes in the stars. After which a bunch of skeletons come and attack you. So I'm going to play a video. Not me cheating. Don't ban my account. But something interesting happens. So you complete the constellation puzzle. You have a bunch of skeletons you have to fight. Then you get into this room where you have to light all these four little fires. And then you get this orb. This orb is what

lets you complete the ritual. as you can see with the label. And some cheaters realized, well, you turn your Wi-Fi off, [snorts] you can just spam the button to use the orb, use the orb, use the orb, use the orb, use the orb, use, and it just builds up this buffer of these sort of complete event messages that go nowhere, but it never reaches a server. So, and the server never tells the game to deactivate the item. So, you can just keep spamming this thing and then eventually you switch back to your Wi-Fi settings and you turn it back on before the game disconnects, which we'll see in just a moment cuz this guy is really

completing this ritual. So, you turn the Wi-Fi back on about 25 seconds. The game hasn't dropped you yet. You're in a kind of desync state. The banner at the bottom has nothing to do with it, but you can see the animation. The orb is completing. Um, so he's finished the ritual just once, but what does that look like? Well, suddenly he's level 50. No more 5minute time sync on constellations. No more fighting skeletons. No more 10 minutes sailing around the map. What does that look like in terms of number? Well, you can make 26,000 gold in 15 minutes doing each of these. It's a fixed amount no matter your level. So, some people play up to

like level 50. So, they spend hours just doing this, but you can make 1.3 million in 15 minutes. And having completed that skeleton camp, you can sail the 10 minutes and spend the five minutes to unlock the next one. But you're adding multiples of 1.3 million doing that. though quite a bit of a change and they're against the ship because it's just so darn cool. That means we can change our definition slightly from two things happening at the same time because we've just seen more than two things happen at the same time. So, let's put it to the test. I did a penetration test for a friend's company a while ago where they built one

of these spin- to- win sites. If you've been on Facebook recently and accidentally clicked an ad in the last 2 years, you'll recognize that spinner. Uh the way it works is pretty straightforward. Your browser makes a web request to some backend system. You see a nice animation, but basically behind the scenes, they're choosing whether you get a reward and they're telling you the number of spins you have remaining. So, we had two spins. We've used up one. We've got one spin left. But we just learned a trick from Sea of Thieves. What if we just send through a whole lot of requests at once? What if we can spin three wheels with one spin remaining? Some of them win us no

prizes, but hey, look, we did win on one of those prizes. So, how does that work? Well, we fire off a ton of requests because this is all timing related. That's kind of the aspect to raise conditions is a time element. Some of them get through right away with one count remaining. Some get through with zero and some of them are going to fail. So, it's a bit of a mixed bag. Here we've got seven requests and three different outcomes across them. The code to do something like this is surprisingly simple. Here is some Python. It's just an async library. You just tell it to make nine web requests in the main function. You just say,

"Hey, in parallel, make nine requests." and the requests get made and hopefully something magic happens. See, the thing is the web when handling web requests does some funny stuff. If we've got those four requests next to the computer, the requests have to leave your machine, go over the internet, find the path, and take various networking routes to reach your target, like the load balancer there in the middle. And some get there really quickly. Some traffics might get lost. There's all kinds of weird things that can happen. Maybe someone in your home is downloading or streaming something, so maybe everything doesn't flow quite as planned. load balances themselves sometimes do weird things. You might have a whole lot of backends

that it needs to distribute to and it doesn't always distribute evenly. So you might get some requests landing right away on one of your web servers being executed while the others haven't even arrived yet. But eventually they kind of get there and um what we get is some kind of response. Either we do or we don't get multiple spins snuck in. The reason this works is because as those requests are being processed the code is doing something behind the scenes. It doesn't run as instantly as we would like. Ultimately, it's going to be doing something like check the users logged in, check how many spins they have remaining, if they have spins remaining, do the spin, and then do the

update. The thing is, if the request can happen close together enough that they haven't change the number of spins remaining, multiple requests can consider the same spins to be available and apply the same logic. So, you can see that dotted line. The first request on the left could complete when the third request is coming in, but at that point, you've deleted the uh depleted spins. The second one's kind of in the middle. It did its check when you had no spins. So, it thinks that you still had a spin when it was running, even though they've now changed. So, that also completes giving you a bonus spin. But the last one, unfortunately, is a bit

too late to get there. So, it doesn't happen. But what if we could freeze time? What if we could take the sort of randomness out of play? And that's what we're going to do now. Well, not us. There's this guy, James Kettle. He works at Portswiger, the guys who make Burp Sweet. And this guy is an absolute legend. He just continually produces amazing content, gobbles up RFC's and um yeah, he's just absolutely next level. You see, he gave a talk at Devcon 31 called Smashing the State Machines, all about web race conditions, and he basically revealed the single packet attack that he's come up with. It's a bit technical, but I think it's really cool. So, we're going to take a look at

it. This is his art artwork that I've stolen, but he's illustrating here there are two requests, and what I've tried to illustrate visually is the same thing. There's a degree of network latency that has to happen for a request to go out. There's jitter. There's few sort of speed test tools and jitter test tools online. But the basic idea is that stuff happens on the net and it can affect how packets do or don't arrive. Then we've got a degree of internal latency and what we really want is those red circles to line up. That's where the code is going to trigger something. If we can get them perfectly aligned, they will probably all do the same thing, but not

so easily done. So here's what we want. What if we could send through 30 requests, 30 spins? What if we could make them all evaluate code at exactly the same moment even though we have no idea what the infrastructure looks like or the code looks like or how any of it works. We need a few pieces for this and a bit of information. So HTTP1 not to be confused with web one HTTP1 the protocol requests are sent over a single connection sequentially or over multiple connections. What we did we had to spin up multiple connections and they all potentially take different routes and that adds to the chaos. So there's one request, there's another request,

there's another request and they're running after each other. HTTP2. However, again, not to be confused with web 2.0. We're talking about the actual networking traffic. Turns out you can send multiple requests concurrently over a single connection. So, that's kind of neat. It's like more efficient, and we can do some stuff with that. Back in 2019, James Kettle also came up with the last bite sync technique. And this keeps the last bite of a request back, allowing you to deliver a potentially huge payload. Again, could be across multiple connections if you've got like megs of data to transfer. So you try and ship everything across and then you try and coordinate at the very moment having it

actually take effect. It's not how it works. But to illustrate, I've left the e off the word alive there just to kind of give the idea that it's not quite a complete web request. There's a bit missing. And another part we need is Nagel's algorithm. It goes back to 1984. I think sometimes we forget how old technology and protocols are. Nagel's algorithm applies to TCP, not UDP, not ICMP, TCP. And it doesn't apply across all of your connections. It's a per connection thing just because you find yourself thinking like how does this not break all of our internet networking and stuff. And what it solves is the small packet problem. You need to send one bite of data over a

TNET connection or something like that. The TCP packet needs 40 bytes for headers. So you have to pay the price of 41 bytes to deliver one bite. Well, that's not very efficient. So that's what this solves. Combines all of the outgoing bytes into a single packet. So that's a more efficient distribution. Again, for one TCP connection, not necessarily all of your networking, otherwise things would get held up. Most modern operating systems have this built in and working. What James figured out is you have these different web requests, each a different color in the um image there, spread across multiple TCP packets, and you leave a bite of all of the requests. You send this massive

payload through a single TCP connection. You put it all the other side of the fence, make it the other side's problem, and you send a single TCP packet, not per connection. We're dealing with one connection, one TCP packet at the very end, which ships all of those missing bytes. For all of those connections, we've effectively preloaded on the other side. So, what it looks like is probably something like this. The requests leave as I did again before they reach the load balancer. Some taking a bit longer. That's us sort of preloading all the data, all the requests, all the spins that we want to do, perhaps 30 or 100 of them. And then we push the magic button.

And that magic button sends a single little TCP packet out into the big scary world of the web, which eventually reaches its destination and completes all of the requests that we have bundled up. Those then fire almost immediately to the servers and drastically closes the gap of timing that we have to deal with to try and get things to execute in parallel. Well, that's a great idea, but who here is going to go home and start playing with Nagel's theory or implementing this stuff? Again, James Kettle's got you covered because why make you do work if he can just ship it straight into Burpswuite. If you don't know what Burpuite is, it's a tool that

sits between your web browser and the website. Gives you a list of everything that your web browser is doing. It allows you to very easily pick one of those and send it to repeater. Repeat is one of the many tools in Burpswuite. Works a bit like Postman. It gives you the request. You can make changes to it and you can send as many as you want on an ad hoc basis. Very good even for devs in my opinion. Just debugging, testing. I I live in Burpsweuite even just doing regular dev work. It seems a little bit counterintuitive, but from within repeater, you can send a request to repeater. It really just duplicates these. Those are the tabs at

the top. You can see I've got three tabs open. I'm going to add another one to get four tabs. You can then right click on the tabs and you can send them to a tab group. And here we have, so I've done that for all four. I've got group number one with four requests in representing those four. And normally in repeater, you'd click the orange send button, but there's a little drop down next to it. And if you use that drop down button, hey, the last bite sync attack is built right in. So you can just ask Burpuite, hey, I want to do all these really crazy weird things. Here's all the bits and pieces. Just make it

happen and off it goes and all the requests uh will use this method and end up a lot more successful. So James couldn't have made it any easier for us, right? I mean, it's all right there. It's a few right clicks. But no, James is just a different kind of guy. So he built a custom action in Burp Suite. You don't even need to pick your requests. You don't even need your tab group. you could just say, "Hey, go find me race conditions in the site." Um, yeah, pretty darn useful. So, just amazing research from him. I am a huge fan. Please go check out his stuff. And in terms of web race conditions, it can

affect so much. Some of the common obvious ones are things like gift cards, reusing gift cards, reusing discounts, maybe you want to like review bomb someone with a whole lot of ratings on your best or worst uh restaurant. What about cash? withdrawing cash. Things can go very very wrong in that realm. Or captures. There's quite a big market in the underground for bypassing and beating captures. But captures themselves are kind of moment in time. They're generated. There's a challenge that has to be answered and then they rolled. But if you can answer them at the same time with the same value, you can potentially sneak by a whole lot of captures with just one value. And also

rate limits. Rate limits. Look at a certain moment in time. How many requests have you made? How many are you allowed? Should we allow you or not? If those all execute in parallel at the same time, suddenly something that lets you make say five requests a second. Maybe you're pushing through a thousand requests in a second and someone's scratching their head wondering why the rate limiting doesn't work. Again, it's things executing at exactly the same time, not aware of each other. Burpuite in their wisdom, or I should say Port Swigers have provided a really great academy. It's like a little market e-commerce store with a challenge. You have $100 a store credit and you have to

buy an item with $1,337. And using what we've just spoken about, there is actually a way. So think about that for a moment. You're more than 10xing your value on an e-commerce store. It's just a lab. But what about other e-commerce stores? But we're not all bad and evil here. I think it's worth knowing that stuff if you're a defender or developer. But let's let's lean a little bit into the securing of things. Some of you may already know this, but some of you might be totally new. Most of what web apps do these days deal with some kind of a database. If you're using a relational database, you can use something called a transaction. It tells

the database everything that's about to happen should happen only once until I'm done. So you wrap up a whole lot of commands. The second part to that is when you're selecting the data like checking the user's balance, you say, I want to know a thing, but I'm going to be changing it. Don't let anyone else know it until I'm done. And immediately that stops multiple things doing the same comparison. It's the database developers problems to figure this out. They've kind of given us this functionality and guaranteed to us that it works. So you don't need to challenge them on it. And at the very end when we finish checking the number of spins available, changing the number of spins

available, we then commit it. We say right, I'm done. Make all the changes that I need and then you can go and serve the next request. So it looks like something like that. Done transaction, you do your query, you do your for update, you'll do your update somewhere in the middle, and then you'll commit it. And that means that second request that even us baddies have just used a really cool attack to send through shouldn't be able to fire. The database forces something parallel and asynchronous run in a very ordered way. Yes, you get database locks and you can get all kinds of scary spiky things on your graphs, but that's a lot safer than

losing a whole lot of money. The database will make sure that one transaction can read and modify that row at a time, force parallel into basically a single queue, single thread, and that keeps us safe. Last year at DevConf, they're a community sponsor, another great conference, guy called Paul Edward gave a talk on race conditions, which made me a bit jealous, but he did a really good job. It's about how they lost millions through race conditions in their fintech app. It does a really good job of giving some examples and going through the different routes, but it's kind of these key premises sort of locking database locking, database transactions, but it's a really good talk. If you want to know more, if you

want to know how to secure your app, if you want to see the type of things that can go wrong in more detail, please give that a watch. But you can't always use database locking or transactions. It's common these days to have multiple uh APIs or pods or servers backed by multiple databases which are actually just read replicas. You can't lock a read replica and locking a read replica wouldn't matter if there's another one next to it. But the poor database cluster to try and sort of handle all these things becomes a nightmare. You could connect to your primary master database and do some locking there. But there's a better way. There's something called Reddus.

Really useful service normally used for caching if you haven't already used it. It lives on the network. network applications can talk to it and they provide as a feature distributed locks for exactly the scenario where a lock can't be shared among a single database. So really useful tool in this age where we've got agentic AI doing things all over the place and stuff running around and distributed cloud lambdas and whatever else. So please do take a look at reddis for its distributed locking if you need to try and solve some of these problems. So where does that put us with our definition? We've seen a few things, but two things happening at the same time still remains

fairly true. But what about if they're almost at the same time? That was just our problem. Things happening almost at the same time prevented our attacks from working. But things can also go wrong in the other direction. So here we've got a little front-end app. We've got a drop down where you pick your country and we list some of the provinces or cities in it. But if we select multiples, we select America and while it's loading, we select England. While it's loading, we'll select South Africa. We see at the bottom that the data starts coming back. We've got America and we've got South Africa, which is what we're looking for. Oh, wait. How does that work? London is

not in South Africa. Because the browser is going to make an Ajax request, send out a web um a web call to fetch some degree of data. We've just looked at how network connections and data can have all kinds of weird elements affecting them. So, you've intentionally sent three requests as a user, but they perhaps landed and been responded to in different times. New requests might be for less data. If you were trying to list all the cities in a massive country versus a tiny country, if you just think about the number of database rows to be fetched or the amount of data to be returned, a much smaller, faster answer might be sent back to your browser before the

bigger one, even if that's not the order that they were requested in. Things like typerheads or autocompletes or filters on tables. You've got tables of, I don't know, financial data or tickets and things like that and you filter by priority and you say you want to see all of them, then you want to see your highs, then you want to see your criticals. You might only have one critical row. By its very nature, that is a much smaller, faster response to send back. The database itself also tries to do some magic. It does caching. Maybe someone came before you and asked for the cities for a certain country. And that way, the database can respond

far faster than um having to do an actual lookup. Maybe you've got some issues there. The impact of this is probably pretty negligible. I mean, it's going to be some confusion, frustration. Things will probably start erroring if somebody's submitting a form with the wrong country and city. The solution is pretty simple though. You can basically keep a counter of how many requests the front end has made. You can increment that counter every time you make a request. Tell that handler, hey, you request number 54 or 55 or 56. And when the data comes back for number 54, if that's not the value of the counter, if that's not the latest request, just return nothing. Discard

it. That way, at least the user stays on a spin or something like that, rather than different sets of data flashing in out of order. Things happening almost at the same time, though. Let's think about e-commerce. Let's say you're on a popular shop. You add uh 650 rounds of stuff to your e-commerce cart because that way you get free shipping. You get redirected to some kind of payment gateway. Put in your credit card details. Place your order. What could go wrong? We have this happening every single day and we all use it. Well, what if you go through some of that process? You reach the payment page and you forget an item. You could click back and

go back to the site and carry on browsing and add it to your cart. But what if you open a new browser tab? While one tab is waiting for you to pay for your order, you add something else to your cart, like I don't know, super valuable diamond. And then you complete your payment. We've just done two things, not at the same time, not perfectly in parallel, but we've got two branches of processing here. Modern sites don't have this vulnerability, but it is certainly a thing that happened where you effectively check out more than you had paid for just because they sort of mark the cart as paid. Obviously, financial loss is the most likely impact there.

Um, the solution again comes down to locking, but you can lock things slightly differently in the case of e-commerce. You can lock the cart. You don't necessarily need to lock the database. We don't have a concurrency problem. We have an out of ordering problem here. So you can lock the cart or what they tend to do is they say, "Right, when you hit check out, that's cart number 100. We're going to send you a for payment for cart 100. Cart 100 can no longer have items added. That user is now on cart 101 the next time they add something." So we separate the checkout process from the adding to cart. Other things can happen at nearly the

same time. We've got Leon in our audience who's doing a talk in this track a little bit later, but he also spoke at DevCon about a bunch of vulnerabilities that he found. One of them involves a race condition. And here's what happens. Some update software gets fed an executable file and told that this is an update it should run. It verifies that the file is something it trusts, something it signed, and it decides whether to run it or not. So, it decides, yes, the file you've given me looks really good. What if Leon replaces the file just in time with something more malicious? The app's already decided the file is safe to run. So, what's it going to do? Well, it runs

what it thinks is the first .exe, but has actually been swapped. These kinds of race conditions get a whole lot worse. One of my favorite exploits ever, I think, is dirty cow just because it's got a cool little logo and a cool name. But it goes back to 2016. It affected basically all Linux and Android operating systems. So huge spread. Uh and this also was just around memory management and a race condition. Few more years on, we've got pseudo having race conditions. Windows in 2025, same thing. Shared resources, race conditions, all kinds of things happen. I mentioned that we use a lot of security tools with a lot of security vendors. So let's take a quick look.

Let's say we need to collect a whole lot of our customers security alerts from their various tools, Microsoft Sentinels, Splunk, things like that. A lot of these tools support web hooks, which means they will send us a notice when there's a message. Some of them rather just sort of keep a list and want you to use the UI, but provide an API. So, sometimes we receive, sometimes we need to fetch, but there's this really big company worth $150 million and they're agentic AI automation for security, which is super impressive, right? They're going to solve our problems. 200 employees, this is the way to go. uh and they say, "Oh, we've already got our own automations and

integrations, so we'll just run it every 5 minutes to fetch the new alerts from the products you need to collect from." Now, we've pictured N8 there, which is an open- source super cool tool. I'm not hating on them. They're not the guilty party. They're just the stand-ins uh so I don't get sued. So, they're not to blame here, uh but very cool tool to look at. So, this vendor tells us, okay, well, what you need to do is you just need to set up a 5minute chron job. That's a schedule. So, every 5 minutes just run and just ask for all the alerts that you haven't yet collected. That sounds pretty good. And I said, uh, what

happens if, uh, let's say at 8:10 it runs, but it's got so much work to do or the internet's just slow that it ends up running while the job that started at 8:05 is still running, and we've now got overlap and they're collecting the same alerts. And they said, "Yeah, that wouldn't be very good." And I said, "No, it wouldn't." So I said, "Okay, well, a start and end range. So what if it runs at 8:10 and it doesn't fetch anything earlier than 85 because the job at 8:05 would be looking back to 8." Well, that's pretty good. you know, we got these very tidal windows. I said, "What happens if the job at 810 fails?" You

know, we get outages, [clears throat] Cloudflare. Uh, and they said, "Oh." I said, "So, so what collects the data between 810 and 8:15?" Yeah. They they they weren't sure. So, we didn't use them. But don't worry, there's another vendor much smaller, about 50 people, worth about 500 million. Security orchestration, automation, and response. their whole things. They've got this patented threat ccentric technology that automatically groups all the alerts that it brings in. Uh so you might have a medium alert, critical alert, information alert. They all get bundled up into this case. Again, we've got nin as the standin. They're not the guilty party here, but just to kind of give the idea. So a case can be made up of many alerts. Each

alert can run an automation. Automations can make decisions such as ask why is total if this thing is malicious or not. If it is, tag it as tag the case as malicious or tag the case as not malicious. But we have a problem. We've got multiple alerts in a case. Why would we tag it as not malicious just when we see one not malicious thing? The other two things might be malicious. Luckily, that couldn't happen except this exact example is in their training material. So imagine the confusion. Your sock analyst is looking at something tagged not malicious and malicious. What are they supposed to do with it? Those are just tags. They don't really matter.

In the official training, again, there's a scenario where an alert comes in, gets evaluated, and they want to change the priority. Maybe something looks like a medium or a high but actually it's benign. We want to downgrade it. In the command pallet of options you can choose is the option to set an alert priority. This is really convenient because the case automatically takes on the priority of the highest alert within it. So it like auto sizes and in the training video they highlight the set alert priority and then they highlight set case priority. I just told you that the case priority is autoc calculated until you override it with set case priority. So when they see an executable which fires

total list is not malicious. The whole case is not malicious. Lower the priority. Well, that doesn't really help us very much and doesn't help our customers. But that's just priority. Luckily, it couldn't get any worse. Oh, wait. It can raining material. The other thing you can do, you can close a case if it's not malicious. Imagine if we went to our customers and say, "Hey, good news. We brought in a critical case. case. It was made up of one alert, one critical, oneformational. But don't worry, we tagged it as not malicious, low priority, and then we closed it for you. Pay no attention to the critical and the medium. So yeah, not the best security tool in my opinion.

Uh less important, but we also have to generate descriptions for a case made up of all these alerts. So we have multiple alerts come in, each have their own description. We kind of need to loop over them, collect them, and push everything back up to the case level. So it looks like that basic automation to update the case. Second alert's going to update the case. Third one's going to update the case. You can see in this area if those are all running at the same time, although they're starting at different times and the um automations might have different steps, they might still perform the same action. We just get all kinds of weird corruption doing

that. Obviously, the impact of this could be pretty huge. Missing security alerts, dismissing them, compromise could go unnoticed, company's data could be leaked. The solution, zero trust to your vendors. Some of them are great. Some of the products are great. Some of the people are great. But do you know how it works, why it works, or do you just follow the training and do what it says? Maybe a bit of a hot take, but maybe vendor should focus on fundamentals a bit more than hype and AI. Just a thought. Um, sorry, get myself muddled.

Yeah. And then determine the state. Don't just blindly change the case's state based on an alert or an automation. kind of reflect back at the end of it say right I want to change the description or I want to change the priority looking at the picture as a whole what should it be that way in theory if you've got three different alerts and different automations running the first one might only see the medium and might say well the case should be a medium or it should take a certain set of action the next one might be the critical and says no no no looking at all the evidence I make a different decision and ideally the information low

should say well I want to set it as a low but it's already a critical who am I to shut this thing down but there's a bit of push back to all of What are the chances? What are the chances multiple alerts are going to come in? What's the chances a low is going to come in with a critical? What are the chances web requests being received at exactly the same time? Security alerts happening, overlapping events. And this is the next cool fun thing. It's the birthday problem. It asks, what is the probability of two people sharing a birthday? Think about it for a second. can all do some basic maths and also I propose to you that

well people may have birthdays and alerts or web requests have start times. We're asking how many people share a birthday in one day but our alerts might be sort of a two uh 10-minute window period. Um so we can say okay well a person's going to have a birthday 365 days. So 1 over 365 gets us 0.27% chance. An alert of 10 minutes over an hour worth of events gets us about 0.7% chance of this happening. So you're right. What are the chances of it happening? Incredibly small. Uh but then we read Wikipedia a little bit further and says only 23 people are needed for that probability to exceed 50%. Well, that doesn't make sense. 0.27% of

people times by 23 gets us six. We've gone wrong somewhere. How are we off by almost 10x? You see, the challenge is we need to look at the comparisons between every possible pair. Every alert coming in needs to be considered against every other alert in or coming in at the same time. We can't just pick one. Here's the birthday analogy. We can't just pick one person. You can't pick your birthday. You can't pick the 1st of January and say how many people in this room have a birthday on the 1st of January. It's not about me. It's not about the date. It's about all of these people. If we were to put everyone split by month and then the

day of month, I think we'd have some really interesting grouping that would happen. Because you see, it's not the seven comparisons on screen that determine the odds. It's the 25. Every person has to be compared with every other person and then the next person compared with every other. So instead of seven, we've got 25 comparisons. That's a lot higher amount of chance for something to happen. But I'm not very smart, so we'll rely on some other people's math. There it is. Five people, 2.7% chance they share a birthday, some random date. We're not deciding the date. By the time you got 70 people in a room, we've got quite a few more than that here. You're at a 99% probability

that two people share the same birthday. So, we probably got, I don't know, eight people sharing birthdays here, if not more. And if we kind of extrapolate that to alerts, well, at 20 alerts or events, there's a 20% chance nearly that they are colliding. and they're happening at the same time. If you're dealing with a 100 alerts, you're at 97% chance that at least two of them overlap. That's pretty huge. And the graph paints it quite well. It's almost this exponential pattern where you can see that as the number goes up, uh the number of people, the number of uh probability goes up. And that's the end. Two things happening at once with some variations.

Any questions if we have time? Anyone panicked about their systems, wants to go rushing home? Man in the yellow shirt who I will pretend not to know. Okay. Anyone have a birthday in the last week of July? Last week of July. Anyone birthday last week of July? That would put you somewhere between the 23rd. No. One person. Okay. So, two out of seven is not a very good odds. Some of you are liars. First week of September. Anyone? First week of September for your birthday. Two, three, four, five, six out of seven out of seven. Okay, so somewhere among the first week made up of seven days, we've almost got enough to prove my point.

Any other questions? Guy next to the guy I pretend not to know that I'll pretend not to know. Anything and everything. Cool. Thank you everyone. >> [applause]

Knock Knock. Race Condition. Who's There?

Related talks