← All talks

BSidesSF 2026 - Detecting Race Conditions on macOS (Olivia Gallucci)

BSidesSF33:4814 viewsPublished 2026-05Watch on YouTube ↗
Mentioned in this talk
Platforms
About this talk
Detecting Race Conditions on macOS Olivia Gallucci GCD powers macOS concurrency. Misused queues and QoS can trigger race conditions, deadlocks, and sandbox escape in privileged services. Using a GSSCred sandbox-escape race, we will map failure modes to code reviews and telemetry for daemon behavioral detections. https://bsidessf2026.sched.com/event/e20eec100ebe1454984197986ffdd572
Show transcript [en]

We are starting our next session pretty soon. We will have Olivia Galuchcci talking about MacOSS detecting race conditions. Before we begin, I'd like to thank you all again for visiting our theater. This is the Diana initiative collaboration with Besides SF 2026. This is the first time we've been doing it. So, thank you for showing some support and um it was really nice to see you guys check out our merch. After the talk, if you would like, you can come get some more free goodies. If you would like, we also have QR codes to join our Discord or check out our website. We will be having an online conference pretty soon. If you would like to participate, if you would like

to have talks there, please um get involved. Come talk to us if you are interested. We do have a resume review session at 3:15 today. Same place right here.

Other reminders, we only have coffee today till 400 p.m. So if you would like to get some, please get it before that. We have head shots upstairs and you can go check that out as well. If you need prayer or mother rooms, that's also available um at the info desk. And if you would like a quiet place to do some crafting and things like that, you can find that right here. Uh we have those supplies right up front. But with that, let's get to our talk. So Olivia Galuchcci from Data Dog talking about MacOSS racing conditions, take it away. >> Thank you. So, um, hi everyone. Thank you so much for having me. My name is Olivia Gluchi

and I work at Data Dog. Today I will be discussing how to detect race conditions on Mac OS, focusing on how misuse of Grand Central Dispatch or GCD leads to failures in privileged system services. So yeah, before we dive into the talk, I want to cover some basics about processes concurrency interleings and parallelism. Starting with a quote from Jonathan Leven. The concept of a process is inherent to all operating systems. A process corresponds to an instance of a running executable. Processes are used by the system as containers for resources like virtual memory or descriptors and for maintaining execution statistics. It's important to note that processes are not runnable entities. A process corresponds to an instance of a running executable, but

not the actual running code. The runnable entities in the executable are the threads. The process itself is technically only a container for one or more threads and provides the virtual memory image with the descriptors and ports shared by all the threads. Thus, when one refers to a processes executing, the correct terminology is at least one of the threads of the process P is executing. While we won't go into this level of detail for most of the talk, it's important to understand what's actually happening. As a recent grad, I interpreted phrases like the process is executing literally when in fact they refer to more abstract concepts. And I hope this clarification can help others avoid similar confusion

when reading blogs and doing concurrency research. Another concept is concurrency. It's the systems ability to make progress on multiple tasks by overlapping or interlec. Think about you know using a single core or limited resources or like two lines of people trying to use one uh vending machine. Interleings are the different possible execution orders in which uh operations from multiple threads or tasks can overlap when accessing shared resources and often leading to things like non-deterministic or unpredictable behavior. Then there is um parallelism which refers to actually executing multiple tasks simultaneously on multiple cores. Overall uh concurrency improves responsiveness and resource utilization even when uh multiple tasks aren't running in true parallel. So this talk is about both concurrency and

parallelism. Concurrency displays the logic and timing bugs. Meanwhile, the uh parallelism is a supporting concept in how GCD scales execution and how that scaling can go wrong. for example, thread over commitment and interleing of privileged and unprivileged operations. So now we can start by discussing Grand Central Dispatch. Oops. Okay. So GCD is Apple's concurrency framework. The it's also a system library that manages task cues and execution without explicit thread management by developers. These task cues hold and schedule blocks of work to be executed later while threads are the system resources that actually run and perform those tasks. GCD is widely used in system dammons and cross-process communication services. It manages dispatch cues to organize work into

tasks and quality of service or coast classes to prioritize which uh tasks get scheduled and uh like when they run and as well as like which get system resources. So dispatch cues in GCD come in two types. The first is serial cues which ensure tasks run one at a time in the order they're received. And then the second one is concurrent cues which allow multiple tasks to execute simultaneously managed by the system scheduler. Developers can create dispatch cues or use global ones with specific coast levels. And those Q choices become security relevant once you factor in coast. So coast classes uh for example uh user initiated uh user interactive and background they influence scheduling priority. The

kerneler uses coast to prioritize CPU allocation affecting responsiveness and interactivity. Under the hood they're public API contracts defined by Apple and implemented endtoend across lib dispatch key threads and the kernel scheduler. They are the signal that this system uses to allocate resources under contention. And in security critical services, correct coast ensures that high priority operations aren't delayed behind lower priority tasks which could otherwise create exploitable timing windows. So issues that could arise from this are things like race conditions, talkt and concurrency bugs. A race condition occurs when two or more executions access a shared state without proper synchronization such that the scheduling interleings change outcomes. If you want to learn more about these uh on Mac OS, shout out to myself, but I

have a blog post on uh tactile attacks as well as a podcast episode with hackers on the rocks also on ttail attacks. So, GCD does not automatically prevent these attacks. Correct usage of cues and serializing access is required to avoid races. So, when dispatch Q target hierarchies or coast propagation are misconfigured, this can create temporal vulnerabilities such as race conditions deadlocks priority inversions, and even sandbox escapes. and privileged contexts. Now, let's take a closer look at some of these vulnerabilities. Priority inversion occurs when a high priority task ends up waiting on a lower priority task. On Darwin, the scheduler can indefinitely preempt aka end low coast threads in favor of higher coast threads, which is

unusual for some OSS. In practice, this means a low priority thread holding a resource like a lock might never get scheduled to release if a high coast thread uh keeps running leading to a pri priority inversion deadlock. This is especially acute with a spin lock, which is when a thread, the one that uh can't acquire the lock, does not sleep. Instead, it busy weights in a tight loop aka spins repeatedly checking until the lock becomes free. Using a plain spin lock aka a busy weight with nouler help across threads is different um because the coast and the different levels of the coast can freeze programs on Darwin OSS since it's possible for higher coast threads to preempt the lower ones. In

such a case, a higher priority task could be blocked behind that lower priority thread. Apple's kernel of course uh implements some priority inheritance in certain synchronization primitives but you have to be using very specific ones. A mutx for example pthread mutx t is a traditional lock that provides mutual exclusion and support for priority inheritance. An unfair lock is a lightweight low-level lock that allows the system to avoid overhead of uh fairness but it still enables that priority boosting mechanism. For example, if a high coast thread waits on a lock held by a low coast thread and that lock is a mutex or an o unfair OS lock, the kernel will temporarily boost that low coast

thread's priority to match the high coast thread at least until that lock is released. In other words, using lock like pthread mutex or OSN fair lock or a dispatch Q which internally handles coast, it allows that system to raise the low priority threads coast so it doesn't stall the higher priority thread. However, if you use synchronization mechanisms that don't support co-inheritance, you can easily expose priority inversions. So, as the blog Exoria put it, um you might want to avoid reader/writer locks, semipors, and custom lock implementations and recommends using OS unfair lock or similar instead because these methods can't participate in co inheritance. And this is because the scheduler has no way to know which thread should be

boosted when using those primitives. For instance, a dispatch semifor or spin lock doesn't carry ownership information like a high priority thread waiting on a semaphore stuck in a low priority task will simply wait with no priority boost for the low priority task. Um, okay, there we go. Apparently, so many priority inversion issues in Mac OS have come from misuse of semifors and spin locks. And it's called the internal semi4 antiattern. From what I can tell, it's called this because it's a very common pattern that looks convenient and it looks like a way that uh to bridge, I guess, async work into a sync control flow, but it breaks the scheduling and progress guarantees that Apple's

concurrency stack is built around leading to priority inversion and sometimes a deadlock. The net effect is that uh high coast work items synchronously wait on lower coast tasks that can stall unexpectedly. Now, there's a few ways to uh test this on your Mac, but the main way is to use Xcode's thread performance checker, which is a red uh a runtime tool. Essentially, it automatically detects when priority inversions are happening at runtime. And you can enable it in your scheme settings and run it to get warnings. When you run the app with this enabled, Xcode will log a warning if it catches any higher coast thread waiting on a lower coast thread. For example, you might see a warning uh similar to

what is displayed on screen, which would indicate a potential priority inversion. Using this tool is an easy way to watch for coast mismatches in real time, especially if you're someone who's trying to learn about vulnerability research on Mac. The checker will flag both the issues of uh priority inversions and even things like nonUI work running in the main thread, helping you catch these problems early. And what I've learned from this is that in code like when you're writing the code or during uh code reviews, we should flag any scenario where a high priority Q or thread is blocked on scheduled work to a lower coast Q. That coast mismatch is a red flag for potential priority

inversion. And these patterns might not crash, but they create a window where a high priority task is needlessly impeded. Again, the colonel may band-aid the issue by boosting thread prior priority temporarily, but that won't fix uh the underlying logic issue of the misprioritized work. So, outside of priority inversion and coast mismatches, we also have dispatch sync deadlocks. Dispatch sync uh deadlocks are a specific type of deadlock that occurs on serial cues or at least in this case they do. Dispatch sync is like a synchronous submission to a que and using dispatch sync can uh cause deadlocks when the target Q is the same as the caller's queue. The call blocks until the block completes and then when

the queue itself is blocked that creates a deadlock. The function dispatch sync will synchronously execute a block on the target Q meaning that it's blocking the thread until that block finishes. The danger is when the target Q is on the same serial queue that the caller is already running on or a queue that it already depends on such that u so you can think of something like the main queue when it's called from the main thread. In that case the deadlock would arise immediately and as con explains uh imagine if you call like dispatch sync and target the current queue that you're already running on. This would of course again run in a deadlock because the call

will wait until the block finishes but the block can't start until the currently executing task is finished. In other words, the thread just waits on itself. This commonly occurs for example if a services listener Q tries to call back on itself asynchronously. The code will freeze because the queue is never free to run a new task. The same pitfall actually applies to the main dispatch Q which is a serial queue. calling dispatch sync from the main thread will hang the app and Apple's documentation like explicitly warns against this scenario. But even without referring to Apple's documentation, many developers have blogged about it. In one case, I saw a developer accidentally created a uh startup deadlock by queuing work to

background threads that each did a dispatch sync on the main queue. This flooded the GCD's worker threads and in turn uh unfortunately the main thread uh itself made a synchronous dispatch call as part of some sort of API and it was waiting for a free worker thread which of course never came since all of them were busy waiting on the main thread and as you can assume the app froze instantly and uh that created a circular wait. The lesson from this was that uh you never really want to be calling dispatch sync on a queue from within that uh same queue or any scenario where you could be waiting on yourself. The correct approach is to use dispatch

async. So it's not dispatch sync, it's dispatch async um for cross q calls and uh either restructure that code that you've already created so that it's not using any sort of synchronous callbacks. Another thing I kind of recommend or actually I don't kind of I actually really recommend it is to uh scan for dispatch sync calls in general. If you find dispatch sync targeting a serial listener queue or the main queue from within a call back, you've likely found a bug. And in code reviews, I recommend flagging patterns like this because it's almost always guaranteed to deadlock the service, even if it's uh rare to trigger. Lastly, we have resource starvation via worker thread busyness.

Excessive dispatch operations without completion. For instance, a heavy concurrency within a concurrent queue can cause the thread pool to saturate or starve other components. And this can appear in telemetry as high thread churn or CPU spikes. And even if you avoid direct deadlocks, misusing GCD can lead to resource starvation, especially when you're thinking about uh like using up all of the available worker threads or the CPU such that you know other tasks end up starving. So GCD on the back end uses a thread pool under under hood right to run these concurrent Q tasks. If you dispatch a huge number of these task or those tasks block without completing you can saturate the thread pool. And as you can

see here GCD is a Q-based API. It abstracts the thread level management away from the developer. There's a worker thread pool which is a collection of threads managed by the system and the tasks are always dqed first in first out uh regardless of how they're executed. Of course, early GCD documentation promised that the system would smartly limit this thread creation, but time showed that it was easy to hit pathological cases. One developer um said that he was kind of uh shocked that after he adopted lib dispatch very heavily, he ran into thread explosion, which was really surprising because they expected the number of threads to be more or less uh the same as the number

of cores. And Apple's response to this was to remove synchronization points and go async all the way. In other words, they discovered that their app was spawning dozens of threads far beyond core count due to the task blocking each other. In an extreme case, uh if all the GCD threads are busy, especially if they're blocked waiting on something, the system may create even more threads to try to break the stalemate. In fact, the lib dispatch thread pool will spawn additional threads if the existing ones are blocked to avoid a deadlock, which you know can lead to thread explosion in addition to the priority inversion problem. This means your process could suddenly have tens or hundreds of

threads just thrashing the CPU. Such CPU uh turn not only hurts performance but it can also starve other system components of CPU time since the scheduler is already busy busy handling all those threads. A symptom in telemetry might be uh high thread count or rapid thread creation and tearown and then sustained CPU spikes without any apparent increase in work being done. Of course, Apple has learned um from many years of experience in this area. One now abandoned API uh security transforms in Mac OS 10.7 inadvertently created a new queue and thread per task causing severe thread proliferation. Many Mac OS statements in iOS 12 were even later rewritten to be single threaded to improve performance reflecting the

realization that unconstrained concurrency can backfire. In summary, to avoid resource starvation, you should limit the number of concurrent dispatches and especially avoid trying to block those calls on those threads. If you have a situation where lots of worker blocks uh are stuck, for example, waiting on locks, semifors, or synchronous calls, you risk both priority inversion and thread pool explosion. So now that we've covered how GCD's Q semantics and scheduling choices can create priority inversions, deadlocks, and starvation, we should try probably, you know, ground ourselves in an incident where this mattered a lot. So in practice, the most dangerous bugs show up when a privileged system service assumes serialization, but the actual execution model is concurrent due to Q

configuration mistakes. A very famous 2018 CVE and the GSS cred XBC service was exactly this. a dispatch Q targeting uh an error that turned unexpected interleings into an exploitable race condition and ultimately led to arbitrary code execution in a root context. So a little bit more history here. What is com.apple.gs cred. It is a Mac OS identifier written in Apple's reverse DNS naming uh scheme or style. The com.apple Apple prefix is Apple's namespace and the reverse domain convention makes the name globally unique and in this case owned by Apple. Mac OS uses these bundle identifiers as stable IDs to label and route apps and system services across the OS so that they don't or so that they do show up in

places like uh logs uh entitlements launch services and permissions decisions like the transparency consent and control database. The GSS cred portion is the specific component referring to a built-in system service involved in man managing uh generic security services. That's the GSS part and this is mainly credentials. And you can think of things like uh Kerarose and enterprise SSO tickets. From a detection perspective, seeing that com.apple.gs cred and off keychain or IPC/XPC telemetry is often normal SSO behavior, but it's also a useful pivot. correlate it with the calling processes, the timing and the volume to spot things like suspicious impersonation, unusual ticket operations or unexpected processes trying to trigger credentials. So a security researcher named Brandon

Aad found a high impact race condition in the GSS credit XPC service. It allowed an unprivileged process to trigger a memory corruption condition in a privileged root service reachable via XPC. It led to an arbitrary code execution within that context. The root cause was that the service created a serial dispatch queue for handling events but failed to set this queue as the target Q for clients uh XPC connections and because of this omission the connection handler executed on a default concurrent Q instead of the intended serial Q. In turn message handlers ran concurrently and violated assumptions about serialization. The exploit mechanism was that two requests could interleave such that a credential was freed by one handler while another

was still processing it. This exploited a timing window causing controlled memory corruption and allowing attacker controlled data to be used leading to arbitrary code execution. Of course, the business impact here was that assumptions about serialization can be broken by misconfigured Q and XPC targets that rely on implicit serialization must explicitly enforce it. race conditions uh in general can be exploited without a kernel compromise. So this exploit despite the CBE itself being patched is why our work today focuses on GCD and XPC interleings. A single Q targeting mistake turned a should be serialized privileged service into a concurrent handler creating a race condition window where an unprivileged client could reliably hit for memory corruption and then exploit

root context code. So what to look for in reviews and telemetry is that any privileged XPC service that assumes singlethreaded state but has concurrent execution paths like missing or incorrect target Q, shared mutable objects across handlers and uh sync weights across cues plus um any suspicious pattern like repeated rapid fire XPC messages that correlate with crashes, restarts or anomalous credential activity from GSS cred. So this bug was not special to GSS cred. It was dangerous because a sandboxed unprivileged client could reach a privileged XPC service and exploit a concurrency mistake to cross a privilege boundary. Moving from that CVE to the sandboxing in the XPC frames this in a broader detection context. We need to

identify which dammons sit on those boundaries like root or um entitlement services reachable via XPC and then focus reviews and telemetry on handler level races and trust assumptions that can turn normal IPC into sandbox escapes. The Mac OS sandbox confines apps but system damons with elevated privileges often expose XPC uh interfaces to sandbox clients. Race conditions and handlers can escalate privileges or bypass isolation boundaries. XPC encapsulates uh max IPC with language bindings and Apple docs highlight IPC setup and messages delivery. Why this matters for security is that a vulnerability in an XPC service that runs as root or with entitlements can act as a sandbox escape vector. This elevates unprivileged code into privileged context, undermining endpoint

security posture. Now that we've uh discovered why these bugs matter, sandbox clients, you know, reach XPC services and settle GCD queuing mistakes can turn that boundary into either an exploit path or reliability failure in security critical services. Um I want to like discuss some examples of why this is good and actionable for detection engineering. I will first start with static review patterns. So the first one is XPC connection queuing. When a Damon accepts a new XPC connection, we want to verify that the uh that certain approved functions are always called. If that doesn't happen, message handlers may execute on anended queue with concurrency or characteristics we did not expect. This is a strong signal because it's easy to codify and it maps

directly to race exposure in privileged services. The second pattern is serial versus concurrent uh execution assumptions. A lot of unsafe code looks correct only if you assume requests arrive one at a time. In practice, XBC clients can create parallel pressure very easily. So in review, we should look for logic that implicitly depends on serial handling, especially around authorization state, object life cycle, or shared cache. if the implementation relies on ordering guarantees that are not explicitly enforced. Again, that's a highv value finding. The third uh signal is synchronous calls, especially dispatch sync. I think any use of dispatch sync in a privileged Damon at this point should be treated as suspicious and should require explicit justification during review. Of course,

not every synchronous dispatch is like wrong, but it often indicates blocking behavior, lock inversion risk, or a path towards a deadlock under load. This kind of construct that may behave uh fine under happy path testing will still fail badly under adversarial testing. The last pattern is a shared immutable state without synchronization. We want to identify globals or shared objects that are accessed from multiple handlers without locks um or things like atomics or dedicated serial Q funnels. This is one of the most common root causes behind timing dependent flaws. And from a business standpoint, I think this is also where secure coding guidance can have the biggest return because the same review rule prevents both reliability

defects and exploit primitives. So static review gets us candidate weaknesses. Telemetry helps us see where those weaknesses are becoming an active operational risk. The first telemetry signal is crash patterns. If we see repeated crash patterns, uh, assertions, or guard failures inside of a privileged Damon, especially at timing sensitive code paths, that's a pretty big red flag. A crash in isolation might look like a stability bug, but a repeated crash around state transitions or cleanup paths or requests uh handling boundaries, that probably indicates a race window being hit intentionally or accidentally. Regardless, it's a strong signal for both uh detection and prevention. The second signal is thread turn spikes. If a Damon suddenly starts creating an unusual number of threads or

if a Q drain uh behavior starts to change sharply, that can mean the service is under concurrency stress that it wasn't designed for. In detection terms, this is valuable because attackers probing race windows often generate this exact kind of pressure. Even if when it's not like malicious, the signal still points us to code paths where weak concurrency controls lie. The third signal is Q black logs. Um, wait times on Q dispatch, heavy synchronous weights, or evidence that work is piling up faster than it drains are all useful indicators. These conditions often show up before a visible crash. They can suggest, you know, deadlocks, priority inversion, or lock contention. And for us, that means that we might be able to uh detect

exploitation or at least attempts of exploitation before it really ever occurs. The final layer is behavioral detection. First, we should flag processes with frequent thread churn and spikes relative to a baseline. The idea is that relative deviation, not absolute volume, because some dammons are naturally really noisy. What we care about is when you know the process starts behaving differently from what its normal profile is. That makes the detection more robust and reduces false positives. Second, we should look at rate limiting certain XPC invocations at least locally. If a client is issuing many parallel requests in a privileged service, it might be trying to probe race windows. This is especially interesting when request volume is high, concurrency is high, and

the target Damon normally expects a low or moderate form of parallelism. Even if we do not immediately block the activity, we should at least log it, score it, and then correlate it with crashes or Q delays. Third, we should detect Q drain uh like the time anomalies. Elevated synchronization weight times can be a strong signal of deadlocks, lock contention, and scheduling inversion. This is also a good example of where performance telemetry meets security telemetry. The same measurement that helps S sur or platform engineering can also help threat detection engineering identify adversarial timing attacks. So overall, the model is pretty straightforward. Static review tells us where waste conditions are likely to exist. Telemetry tells us where those weak

points are being stressed and behavioral detections tell us when that stress looks normal or adversarial. So now that we know the problems and the detectable opportunities, let's go over what we covered today. We learned that race conditions on Mac OS are not just reliability bugs. Uh in the right service boundary, they become security problems pretty quick. We looked into how GCD works and where common concurrency hazards lie. uh where Q design, coast and synchronization choices matter and privilege code. We also covered how priority inversions and lib dispatch sync misuse can create deadlocks, starvation and timing windows that are difficult to catch in normal testing. And more importantly, we grounded that in the GSS cred instance

where a Q mis configuration broke serialization and assumptions around serialization that turned concurrent interleings into root context code execution. From a detection engineering perspective, that takeaway is also pretty straightforward. We should review for unsafe queuing patterns which telemetry uh happens to show thread churn crashes and queue backlogs and then treat architectural assumptions about serialization as something that is always security relevant. I think if we do that well, we can catch these issues a lot earlier and reduce exploit opportunities in all privileged Mac OS services. So yeah, I have more in terms of where you can go from here. If you're someone who likes to do really fine grain nonsense on Mac, I have a little

newsletter. Um, and yeah, so thank you so much for providing me the opportunity to present. I hope you all enjoyed and uh, yeah, let me know if you have questions.

I also can't really see, so if anyone's Yeah, it's very bright. Yeah, I'm I'm gonna I'm not going to look at you. I'm just going to look down because the light. So,

so I would have to look into that specific like subsystem and subset directly to like actually explain what the problems might be. But I feel like anytime you develop a new system, there's always going to be new problems. So just by the fact that they're reintroducing something, I would say yes. In terms of what they actually are, I don't know. So hopefully that helps. Any more questions? Okay.

100% 100%. So a big reason why I I did this presentation is because I like AI. I use AI and all that stuff, right? But a lot of this is very historical and like context dependent. And around a year ago, chat GBT, for example, couldn't even tell you where logs are stored on Mac OS. And I actually I tested this once. I I I did a screenshot of asking it 10 different times in 10 different chats to like where are logs stored and it just couldn't tell me. It made up like directories and all this stuff. And what I've realized though is there's enough garbage code online already from like Stack Overflow where it's like, "Hey, I have this problem.

Can you help me?" So if you ask your code generator to generate you a bunch of failed examples, it helps with detections and like detecting this type of bad workflow when you're trying to review for it, if that makes sense. Like you ask it to generate a bunch of bad examples and then use those to be like, "Hey, is this occurring here?" because almost all of this stuff here is something you need to like actually search for. Um you need to like go in and check for it and AI won't naturally be like oh you know your concurrency queue might be kind of misaligned. It just won't it won't do that especially if you're using Objective C or uh Swift

it's better with but Objective C it's just not. So I I would say just ask it to generate bad stuff and then use that to be like is that going to happen? Because you're very much right that it's gonna happen. So or it does happen already. So yeah, >> does that help? >> Cool. Any more questions? >> Um that's that's all the time we have for now. Okay. >> But uh we do have 15 minutes before our next talk. So if you would like to ask Olivia more questions, please do so. Thank you, Olivia. Thank you again.

[ feedback ]