Breaking the Mobile Log Analysis Barrier with AI

BSides NYC · 202526:4061 viewsPublished 2025-12Watch on YouTube ↗

Speakers

Numa Dhamani

Tags

StyleTalk

Show transcript [en]

Okay. Hi everyone. Um, I'm Nimmed Mani. I'm the head of machine learning at I Verify, a cyber security company focused on mobile threat hunting. I am also a lecturer at the University of Texas at Austin where I teach a class on AI and the co-author of introduction to generative AI published by Manning publications. Um so today I'm going to talk about breaking the mobile log analysis barrier. So mobile logs like Android bug reports and iOS unified blogs contain really useful information but they are so massive and they're so messy that making sense of them is a barrier in itself. Um and that's exactly where AI can help. Yeah. Okay. Can people hear me fine? Okay. Thank you. Okay. Um, so first we

will talk about why mobile logs are so difficult and overwhelming to work with. Um, then we'll look at the approach that we have developed at iverify um that uses AI to make sense of them in a secure way. Um, and then finally I'm going to share how we are releasing this as a free community tool that anyone can try. So let's start by understanding what mobile logs actually are. Um so an Android bug report is a system generated package that captures diagnostic data from across a device. So this includes apps, system services, the kernel and hardware. And an iOS unified log is Apple's continuous logging system which collects structured events across all subsystems. So um both of them

record system and app logs, running processes, battery and power usage, network activity, device state, crash and event logs, and sometimes even user data. So this makes them incredibly valuable for forensics and security um investigations. But in practice, one of the challenges that we have is scale. So a single Android buck report can actually exceed 200 megabytes and contain millions of lines while a single iOS device can actually generate around six million lock entries per day. So that's about roughly 70 every second. So here are some examples of what they actually look like. Um so on your left you have an Android bug report excerpt and you can see that in addition to being so massive, they're also

fragmented. Um and there are millions of lines mixing different formats. So your time stamps aren't consistent. You will see wall clock. You will see uptime since boot and you will see kernel seconds all intermixed. And kind of in there are pieces of sensitive data like um phone numbers, Wi-Fi identifiers. And then on your right um I have an iOS unified log excerpt. So these logs are also massive in volume. Um they're often multiple gigabytes per device. Um they're deeply nested, they're verbose, and they're actually really hard to correlate across subsystems. And then just like in Android, um they can also surface really sensitive details about apps, network activity or location. Um so even though logs like this contain

incredible forensic value, they're so overwhelming to analyze. And this is exactly the kind of barrier that we need to break. This isn't just about the logs themselves, but it's also about the bigger mobile security gap that they create. So this gap actually shows up in three ways. Um with scale, expertise, and time. So, first scale. Um, the logs themselves are massive and fragmented. Um, the second is expertise. So, there honestly just aren't that many specialists with deep mobile forensics knowledge. And the third is time. So, manual review takes hours or days per device. And that simply doesn't work in fastmoving investigations. So, that combination of scale, limited expertise, and slow analysis actually leave security teams without the

visibility that they need um especially when they need it the most. Um so what do we do today? Um so right now it's usually a tr um a mix of traditional methods. So first there's known threat hunting um which means you check logs for indicators of compromise or signatures. So you're looking for things like is bad app.apk installed. Um but that really only ever catches what you know. It'll only ever catch yesterday's malware and it will never catch novel um variance or zero days. There's also spot checking. Um, so this is manually reviewing portions of logs, but you can't check everything, right? Because these logs are massive. So coverage is often incomplete and compromises just might get missed. Um,

and then finally, there's incident response. So you react after an alert actually comes in. Um, but by then your damage is already done and investigations can take days. So all of these approaches, I think, have their place, but none of them solve the core challenge of scale and visibility in mobile logs. So this is why we need a new approach. um and one that uses AI to make sense of mobile locks at scale. So this is a highle overview of our solution um and it combines traditional natural language processing so NLP with a targeted use of large language models to turn raw logs into timelines and reports that are secure, structured and actionable. So this is not about you

know dumping millions of lines of logs into an LLM and hoping for the best. Um but the system actually adapts to Android and iOS differences. It reconstructs messy and fragmented timelines and then finally it will actually use NLP and LLM analysis to produce reports. Um so the pipeline for our system starts with security validation. So we'll check for things like malicious payloads, entropy spikes that might signal hidden content and then any sort of injection attempts where attackers might try to manipulate the analysis. Um then we go into platform detection. So we'll detect between Android and iOS. We'll adjust for any sort of version differences um and manufacturer customizations and we'll pick the right parser. Then the logs will flow into the NLP and

LLM pipeline which is the heart of the solution. So the system first is pre-processing with NLP and it will redact any sort of personally identifiable information. It'll normalize formats. Um it'll filter out any noise. Um and then it will reconstruct time across boot sessions that aligns logs that use wall clock time and uptime and kernel seconds into a single consistent timeline. Um then we apply semantic checking and what that means is we don't split um these files into random pieces by size or line count. So, for example, if you, you know, were deciding to split every like 10,000 lines, you would probably end up cutting, I don't know, like a stack trace in half or you might just separate

a crash log from all of the events around it and then you would lose a lot of context. So what we actually do is we will group full stack stack traces um crashes or security events into coherent units and then we'll prioritize high signal sections and then only then we'll bring in the large language model and use it in a very targeted way which is that the model will try to analyze um curated structured chunks and then output very focused findings and then finally the system will generate a report that adapts to the audience. So security researchers will get technical evidence, IT admins will get actionable summaries, um and end users will get clear explanations of what happened and what

they can do next. Um so the same raw block can actually produce different perspectives um each kind of tailored to the right level of detail. So one of the key challenges is that even though Android and iOS record the same kind of security events, um they actually express them very differently. So I have an example here. Um on Android you might see something like an XC Linux denial where the media server process is blocked from reading a file. On iOS you might see a sandbox violation where media server D's denied file access. So the logs actually look really different if you look at the lines. Um but they mean the same thing, right? So they what

all they're both saying is a media service is trying to access something that it shouldn't. So our system actually maps them to the same security meeting. So in both cases, this is a media um service security violation. And when we start to handle both platforms in one system, we can start to make analysis more consistent, reduce operational overhead, and start to give people and teams a single way to understand what is happening, no matter what your device is. Um I touched on this a little bit already, but the same logs can be explained very differently depending on who's reading them. So for the device owner, the report will explain things in very clear everyday language. So in this

case, your device is in compromised, but some apps couldn't connect because of restrictions. Um, that's just a clear statement of what happened. For an IT admin, the same event is presented in compliance terms. So multiple VPN apps are detected, controls are in place, but a re review is required. So that's kind of framed in a way that directly supports policy and fleet management. And then for the researcher, you'll see that there's full technical detail preserved, right? So, VPN usage, blocked protocol, app op denials, high file descriptor usage. Um, what they also get is line level um information and evidence that they can start to dig into. So, the point here is that underlying logs are the same, but the

system adapts the explanation to the audience, each kind of getting the right level of detail shown in a way that's most useful to them. Um, another thing that I want to touch on is that when we first built the system, security wasn't an afterthought. So, it's part of the design from the very beginning. So it's built into every stage of the pipeline. So before any analysis begins, we and the system will validate the logs. It'll block any sort of attempts to slip um hidden instructions or binary payloads. And I'll check for suspicious entropy patterns that might conceal malicious content. And then of course, because logs can contain and um and expose personal data, the system will redact

personally identifiable information before the logs are ever processed. And then everything runs in a private enterprisegrade environment so we can make sure that the data is never shared externally or used for model training. Um what we also do at the end is we score each file for potential threats. So if a log will look manipulated or unusually risky, it's rejected to protect both the user and the system itself. So every log that goes through this pipeline is actually handled with some layered safeguards making sure that analysis is not only useful, but it's also private and it's secure. So what does this actually mean? Um what this really changes is the impact for the teams that work with mobile locks.

Um so before you had to be a deep mobile OS expert to be able to get anywhere with these files and now even without that expertise security professionals can gain useful visibility into what is happening on a device. So investigations that would take hours or maybe weeks can now maybe finish in 15 to 20 minutes. And then instead of manually spot-checking, you know, very small target portions of a log, um the system processes the entire file through the pipeline. So we can start to make sure that nothing gets overlooked. Um I think this also changes um threat detection, right? So traditional methods can really only catch what's already known because a lot of what you're doing is matching

signatures or reacting after an alert. And this approach helps you go a little bit further by analyzing the entire log and surfacing anomalies and suspicious patterns. And what that means is that we can start to highlight behaviors that don't fit normal system activity. And these kind of signals might point to novel or unknown threats. And because this process is just so much faster and more automated, the daily capacity also expands. So maybe from one or two devices a day to dozens. At the same time though, I don't think this replace expertise, but I think it amplifies it. So a security researcher can now analyze um far more devices and focus on interpretation. um IT admins

can start to escalate specific concerns with more clear evidence and I think even end users benefit right because they're more informed about what's happening on their device. So the balance here is democratization with depth. So anyone can gain visibility but experts still play a really vital role in judging what the findings mean. And for many organizations this might be the difference between having little to no mobile forensics capability and then being able to achieve visibility into what's happening on their mobile endpoints. So, mobile forensics has traditionally required very niche expertise. Um, it's required very long turnaround times and limited coverage. Even with some of the best people, this work has always been very complex. It's been very slow, and

it's been very hard to scale. Um, and that's the value of a system like this. Um, it doesn't replace experts, but it makes their work a lot faster and more efficient. And it also helps give non-experts a way in. So, we've tried to take mobile log analysis from something that was unrealistic at scale and we've tried to make that practical. Um, now let's look at a few case studies to actually um see what this looks like in practice. So, here's a snippet from an Android buck report. And even with just a few lines, we're seeing multiple things happening almost at once. So, there's an ANR in WhatsApp, so an application's not responding. Um, there's something about

a low memory color taking out multiple apps. Um, and there's CPU performance being throttled. So, um, at first glance, this looks like the phone is maybe unstable. There's apps crashing. There are systems slowing down. There might even be signs of compromise if that is happening. Um, now for a second, let's just scale that up. So, imagine that instead of the five or 10 lines I'm showing you, you may be looking at millions. Um, and to make that harder, the timestamps don't line up. So some use walk clock time, some use up time since boot and others are kernel seconds. So even before you get to what these events actually mean, just figuring out the order that they

happened in is already a really big challenge. So this is where a pipeline comes in. So this is that same log once it's been processed by our system. And instead of cryptic log lines about ANRS and low memory color and throttling link, what you actually get is a clear audience tailored report for the device owner. Um, so what first looks like a crash or you know potential compromise is actually just the device is under memory pressure. So apps like Microsoft Office and Google productivity are just running really slow. Um, WhatsApp froze. The system started throttling CPU to cope with heat. So there's really no compromise. It's just resource auction. Um, this is the timeline reconstruction

for that report. So the system takes these events from all the different log formats and it aligns them into a single um coherent flow. So we can start to the story a little bit more clearly here. Um so if you look at this um a signal session opens, multiple apps start piling up, memory requests, device heats up, low memory killer terminates Google maps, and then right after that your WhatsApp locks up. So what maybe looks like very random crashes inside law raw logs is actually just resource management failure playing out step by step. Um sometimes what happens is that if you misorder your events you actually misdiagnose the problem. So having a timeline like this helps us piece

together not just what happened but why that happened. Um finally the system will generate clear recommendations. So in this case it suggests resetting app states, clearing memory, you know, closing or removing background heavy apps that are driving pressure. Um, and looking at specific problem apps. So for example, just clearing WhatsApp business cache after the freeze. Um, so what you will notice is that the these recommendations actually flow directly from the evidence in the logs. And now instead of just leaving the user guessing why the phone was crashing, they now have a short, you know, concrete checklist of next steps that are tied to the actual events on the device. Now, let's look at iOS. Um, so this is a

snippet from an iOS unified log. Um, I mentioned this before, but these logs are massive. I cannot say that enough. They're often gigabytes per device. Um, they're deeply nested. Um, but here are some entries that I hand selected from that. Um, so you can see that we've got some repeated references to um, Cydia. We've got an unusual app install, and we've got a burst of location tracking events. So on their own, things like an app install or a spike in location activity might not actually raise a lot of alarms, right? It could look like normal routine behavior. But when a lot of these patterns appear together across subsystems, um they can start to suggest that something is off. So the challenge

here though is that unified logs will scatter these clues across dozens of subsystems and each has its own little format. So you might see a process ID in one place, you might see a location event in another, and you might see an app reference somewhere entirely different, but you don't really see them together. So even when security professionals analyze these logs, they usually only look at certain fragments because it's just not realistic to read through millions of nested entries and then manually be able to connect them into a coherent story. So here's what that same log looks like when it's once it's been processed by our system. So instead of scattered references about apps and location

activity, the device owner gets a clear report. So your device appears to be jailbroken and that's a critical compromise. Um the report actually also tries to explain some of the risks. So you know it talks about how core protections like code signing and sandboxing might be disabled and it flags abnormal location activity that could indicate surveillance or tracking. So the point here isn't really that we've surfaced jailbreak traces, but it's that we've taken millions of cryptic log entries and we've translated them into clear audience appropriate reports. So this lowers the barrier to understanding while still leaving room for security professionals to provide deeper context when needed. Um here's the reconstructed timeline for the iOS case. Once the system aligns

events across subsystems, the chain of activity becomes a lot clearer, right? Right? So jailbreak traces appear, core processes interact with them, network traffic follows, and then suspicious app and location activity kind of stack on top. So when we look at it in order, um what kind of was maybe scattered log lines actually tells a very coherent story which is your device jailbroken. From there, the system generates evidence-based recommendations. So disconnect, wipe and restore from a trusted backup, resecure accounts, and in high-risisk cases, you probably want to place your device. So today we're actually releasing this solution as a free community tool that's available for anyone to try. So mobile forensics has always been limited to those with specialized tools

and rare expertise. Um what we built does not replace that expertise but it amplifies it. So security professionals can now cover far more ground with greater consistency and non-technical users can finally gain visibility into problems that were once hidden inside logs. So I think this balance is really important. Experts will provide judgment and context and AI just makes the process a lot more faster, more accessible, and more scalable. And everyone really deserves to understand what's happening on their device because mobile security should not be out of reach for anyone. And this brings us to the bugalyzer. So the bugalyzer is our free community tool for Android bug reports. So it takes the same pipeline that we've walked through

and it makes it available to anyone. It's designed for the device owner audience. So the output is just a clear, easy to understand report about what is happening on your phone. Um I'm going to repeat this but it's not a replacement for professional forensics. Um the in context for security professionals might still be required but it does lower that barrier. Right? So now everyday users can get more visibility that they've never had before and security professionals can have a more faster, more consistent way to work. So you can try it today. Bugalyzer is free. It's secure. It's available right now on bugalyzer.verify.io. Um, all you have to do is upload a bug report. Um, and you'll get back an

explanation of what's happening on your device. So, this is our first step towards trying to make mobile forensics more accessible while still recognizing that some situations need professional expertise for deeper investigation. Um, but ultimately what we want to do is we want to be able to give users and teams a clear way to understand and communicate what is happening on their devices. So um mobile devices generate endless logs and inside those logs are a lot of answers to crashes and performance issues and security threats. And the problem is that until now those answers have been out of reach for a lot of people. With Bugalizer we've tried to make those answers visible, structured and actionable. So you have the logs and

we can give you the answers. Um and that's all I've got. Thank you so much for your time today. I know it's been a really long day of sessions. So I really appreciate you hanging in there with me. Um, I think we can open it up for questions now. I think we maybe have like a minute or two. Um, five minutes. Amazing. So, we've got five minutes. We can do a couple questions. Um, but if you'd like to continue the conversation afterward, um, my email's right there. You can also find me on LinkedIn. And we also have some people from Iverify here. Happy to chat. Okay. [applause] Yes. >> Um, right now we're using Claude.

Yeah. >> We're using it in a in a private enterprise environment. So, we're using it through Bedrock. >> Yeah. So, Anthropic doesn't actually get any of your data. >> Yeah. Yes.

>> Yeah. Um I think it was just like very traditional um the way you would work with like an LLM. So it was a lot of like instruction tuning, a lot of prompt engineering. Um we had some guard rules around like hallucinations. Um so I think like very typical very typical like LLM tuning is what you kind of expect there. Um but I think very targeted to our use case. >> Yes. >> What's the concept?

What's the delay?

Um, so what's really nice is the way that we've built it, it should be able to generalize to a lot of different kinds of Android customizations. Um, and I think that's one of the really nice things about using an LLM for something like this because it's a lot easier to generalize versus traditional NLP. So for a lot of the traditional NLP part, I do think um it is I think broad enough to where it will still be able to generalize it because we have some adaptive classifiers that will learn from the new data coming in to be able to classify some of the high security events accordingly. Um but that's the part that I would imagine might take a

little bit of tweaking but not drastically enough to throw off your results by way too much. I don't know if that answers your questions but yeah. Great. >> Yes. >> How often have you noticed end up being approach.

>> Yeah. Um, I do think you could use something like this for any kind of logs, right? I like to think of it more as like a translation problem more than anything because you are taking something that maybe isn't very, I don't know, human readable and turning it into something that is. So I think that concept can really work for any kind of log. Um in terms of it I don't know hallucinating and flagging stuff that it shouldn't. Um it it doesn't do it too often but it does do it sometimes and that's the nature of working with probabilistic models. So we have worked with some mitigations in there. Um and which is why it doesn't do it too often.

But um this is also why I think I've emphasized like three or four times that security professionals might still be needed and you might still need deeper context because I think a lot of times if you do have you know someone who has worked in security before they're able to better tell oh you know this is maybe a hallucination and we need to dig into that more versus just taking it word for word. >> Yes.

Yeah. So, we actually haven't released a solution for iOS yet, but internally we've only used it for um iOS unified logs and then crash logs on the iOS side. Yeah. And it's worked pretty well for both. >> Yes.

So um well one of the things that we have done for the researcher audience which I didn't talk a lot about today um is and that's also one of the ways we're using to mitigate hallucinations is where I will actually make it um give us the log evidence for any claim it makes and it will go through like here's a line number here's exactly where to go find it in your log um and that log level evidence actually helps us get it a little bit more grounded in some facts. So, it helps mitigate some hallucinations, but in the same time, it can help you kind of figure out, okay, this is something that I maybe haven't

seen before. Let me go dig into that. Is there something like that in a different log or not? Um, so that's kind of how we have it set up right now, and that's how we've been using it internally. Um, but we haven't done anything where um where we're kind of tracking all of the ones that are different and then like storing them somewhere. Um, but maybe in the future. >> Yes.

So we yeah so what we do do is um we do keep track of um essentially like false positive rates and true positive rates and how much of that we are seeing so we can tweak the models accordingly when we have to. Um we do have some mitigations in place through like instruction tuning, some prompt engineering, getting it more like grounded in facts and things like that. Um which does help it reduce it quite a bit. Um but of course with probabilistic systems you can't really fully eliminate them. So yeah, but this will be really cool for a lot of people I think to try and then we can make it better. >> Yeah. Okay. Well, thank you so much

I'm sorry I couldn't hear you, but did you want I'll chat with you right here?

Breaking the Mobile Log Analysis Barrier with AI

Related talks