← All talks

The Sound of Secrets

BSides Leeds · 202628:1836 viewsPublished 2025-08Watch on YouTube ↗
Speakers
Tags
About this talk
James Bore explores sonification—encoding data as sound—to detect patterns in plaintext versus encrypted information. Through live audience experiments, he demonstrates that humans can distinguish encrypted data from cleartext when converted to harmonic chords, drawing on Shannon entropy and Kolmogorov complexity to explain why patterns disappear under encryption. The talk argues for applying audio-processing capabilities to cybersecurity monitoring, inspired by techniques already used in medicine, sonar, and nuclear facilities.
Show transcript [en]

There was one lie in that little introduction which is I cannot bake at all. So sound of secrets, you're all going to take part in an experiment. First of all, who's heard of sonification? Yeah. So for those who haven't, sonification is taking data or information and turning it into sound. really is that simple. Arguably, playing music from a musical score is sonification. But equally, you can encode anything as anything else. So, with the samples that you will be hearing, what I've actually done is taken them as ASKI strings, broken them down into six bit chunks, created a list of 64 harmonic chords, and then just encoded the chunks straight across. So that's what you'll be hearing. Now the point of this

experiment we'll get to. But you can encode anything as anything else. Sonification is encoding anything as sound. And the purpose of it is to aid human perception. We are very good at hearing things and we're very good at processing what we hear. So why sonification particularly in cyber security? It's a very good question but we process our senses in different ways. So we have visual processing circuits and we have audio processing circuits. Now they are later in the brain integrated to give us a nice big round picture of the world but the actual processing is separate which means we can have interrupt requests. If you ever been focused on something and you hear a beep or you hear a noise

somewhere and your attention immediately goes to that, that is a human interrupt request. And audio gets a surprisingly high priority in that. And even better, you don't have to focus on it for that to work. So, if you've ever played spot the difference games, you know how hard it is just to spot an anomaly visually. And that's because so much of what we see is invented by our brains and pieced together. But what we hear is much more raw data. It's a much more fundamental sense. So we don't have to pay attention in the same way in theory, my theory at least, to pick out anomalies. >> And we've got that pattern recognition, you know, if a sound is melodic or not.

You know if it's off key. You know if something doesn't sound quite right. If it's 3:00 in the morning and the floorboard creeps and you're familiar with the house. You know whether that's just the house settling or someone sneaking in. Admittedly you might not be completely sure and have to either go and check or hide under the bed. But in theory you know. So my hypothesis this is a genuine scientific experiment. Not necessarily welldesigned but it is an experiment. I'm theorizing that humans can tell the difference between clear so clear text and encrypted information without consciously processing it without understanding what the information is. So without seeing the actual information in this case I'm saying I think you can

hear the difference between clear text and encrypted text when it's codified into sound. I could be wrong. I've not tested this on many people and we're going to find out. Now, just quickly, this is back to the interrupt. As I was saying, human interrupt requests. Yeah. Anyone recognize this waveform?

It is a sound. Well, you might recognize it in a minute. This is what I think should replace all sock audio alerts for security instance.

I mean, come on. That would just get a much better reaction in the zot. Welcome to the lab. I was going to do a clever interactive quiz where you could scan it with your phones and enter the answers on there, but then I thought that sounded like work. So, we're going to do raising your hands. This is practice round. This is your chance to hear an example. So, this is a clear text English language sentence which has been encoded into harmonic chords. Nothing special, no other processing done to it.

Each of the clicks is trimmed down to 10 seconds because otherwise some of the encrypted ones would be about an hour and it would be an easy way to do a talk, but I don't think you'd enjoy it. But that's the clear text one. Now think about this and see if you can spot a difference.

>> Okay, everyone ready for round one?

I remember that and compare it to this.

Okay. Who thinks the first one was clear text? >> Okay. The rest of you think the second one was clear text. >> Very different. >> They were different. First one was cipher text. Most of you were right. Now, you probably can't tell me why you might have got that right, but we'll get on to that later. Round two. Don't worry, there's not too many of these. Like I said, it would be an easy way to do a talk, but it would get boring fast.

So again, who thought the first one wasn't clear text?

I did randomly sort these. It's not pre-planned. And again, most of you are right. Final round.

Okay, first one. >> Second. >> Can anyone tell me why you could tell the difference?

double. >> It sounds a little bit um more sharper and it's more consistent. >> I mean, that shouldn't be the case because they're generated exactly the same way. It just sense

up more effect. >> One more answer. >> Well, that's an interesting one. >> Because this this is text that's been turned into sound. Why on earth would we get dissonance by encrypting it? We do have a bonus round. This is not text. This is an image that's been encoded to sound because I wanted to make a point.

[Music]

First one clear image. Second one encrypted. Actually that image. This guy's not even in the room. How can I annoy him when he's not here? But the point is a bit map is actually a lot more structured than English and you can really hear that. So what you're actually hearing, this is still my hypothesis, still my theory. No one's sponsored me thousands of pounds to go and research this yet. If there's any research sponsors in the room, come and see me at the bookshop later. This is maths. This is the equation for Shannon complexity. I'm not going to go deep into the mathematics now, but I will tell you roughly what it is. It tells

you how many symbols, whether it's characters, whether it's words, whether it's images, anything of a message you need to have a 50/50 chance of predicting the next symbol. Now, it's useful in all sorts of ways, but with English generally, whether it's words, whether it's letters, generally it sits at around nine. After nine words, you've got a 50/50 chance. If it's the Daily Mail, it's about 5 to 7. If it's Shakespeare, it's about 11. The sample texts I did were all sitting at about 11 and a bit because they were short samples. But if we encrypt data, what we're trying to do is destroy the pattern. We want to introduce randomness. So when you encrypt data, you would expect

Shannon complexity to go up. Yeah, this is what happens when we encrypt the data. So the blue is clear. Now, obviously it's not the numbers. I've normalized them all so that the base clear text is at 100, not the actual number. I couldn't get the labels to go away. Ignore them. Just look at the patterns. The orange bar is encrypted versions of that text. The green bar is compressed versions of that text. Now, does anyone know why when we compress we would also expect the pattern to go away?

Yeah, it's removing redundancy. That's the key thing that compression does. Encryption removes pattern and introduces randomness. But compression removes redundancy for efficiency. Incidentally, the red bar is what happens if you encrypt and then compress. And the purple bar is what happens if you compress and then encrypt. Just a fun fact that one. But Shannon entropy is a really useful concept to understand things like the value of information, whether there is structure in a message, all sorts of other things. The most valuable bit of information is the most surprising. It's the one that gives you the most information about what's coming up. And you may have seen those logic puzzles about having balls in a jar. Let's say

you've got 49 blues and one red. Now, which one can you draw to tell you the maximum amount of information about the next draw? It's the red. Tells you every drawer after that is going to be blue. It's also the most unlikely one to draw, but that's besides the point. But it's surprising information that helps give meaning to the message. This this is not maths or not real maths. This is more a conceptual equation. So there's no numbers attached. This is Kog. Yeah. Comrav complexity. The definition of that. So Krov was a Russian mathematician unlike Shannon who was British and came up with this concept of how much can you reduce any input. This is not possible to calculate.

That's why it's conceptual. Now who's aware of things like NP port hing problems? Yeah, I'm going to check my notes for this because I want to make sure I don't trip over myself. I'm not going to go fully into the halting problem, but the concept is you cannot have a program that will tell you whether a program will run to completion or run forever. The only way you can find out is by executing the program on a universal chairing machine. So that's the halting problem. There is no way to tell whether a program will run infinitely without running it. Now we all know in reality that's slightly different because the computer crashes after 10 minutes and everything holds.

But com complexity is about the smallest possible program that produces the output you want. So it is the maximum possible compression of any message. And it can't be calculated but it is related to Shannon entropy because it's related to structure. So while it can't be calculated you can get an idea by looking at the patterns. If you can see a pattern in data it means you can compress it more without exception. If there's a pattern, you can compress it. If you can perceive a pattern in encrypted data, it's possible that you can retrieve some of the information. That's why you've got to introduce randomness. Okay. Now, why why am I doing all this? Well, first

of all, any questions at the moment before I get all optimistic?

come across because I studied physics and I studied quantum physics and when people spoke about quantum computing and collapsing probabilities and things like that I thought it was going to be all of this clever stuff where you calculate the problem in multiple ways and then collapse the problem. It's not. It's a water-based computer. It's analog computing. It's really depressing. So this doesn't work with >> no because it's still just the same encryption algorithm. I mean the the encryption algorithm I used was actually AES256 just run over text. So yeah it's still the same thing but what is it useful?

They aren't >> they are not alphabet specific at all. So they both of of Shannon worked with symbols and symbols can be anything. It can be an alphabet. It can be an image. Could be a series of colors or sounds or sense or whatever you want it to be. Arguably you could look at something like a carter from martial arts or a pattern from martial arts and do the same sort of analysis on it because it's a set of symbols.

So why do this? Well, we already do it in medicine. Everyone recognizes an EK E EKG and we use those because, you know, if a surgeon's cut you open and is currently massaging your heart, they don't really want to be staring at the monitor to see whether it's beating or not. We use it in other ways in medicine, but it's all about having that separate input. For those of you who know Terry Pratchett, he used to work in nuclear power. And he once wrote a sentence saying, "Everything in a nuclear power station beats or buzzes or shrieks or something because it's the only way to get your attention." So, medicine, we already use it there.

submarines. Sonar is literally sonification. Very literally. And sonar operators do exactly this sort of pattern recognition. Now, we've got algorithms and fancy things to do it now, but we still put the headphones on them, and there's a reason for that. Now, what about network monitoring? Now I did run a few tests where I sonified some PCAT files, normal activity, DDOS activity, various other things. It needs a lot more work because networks sound really boring. Uh mainly because I was only able to get it to recognize different protocols and map those to sounds rather than doing anything clever because I didn't have time. But if you imagine you're in the sock, you're in the knock, you've got

the network activity just blipping away very quietly in the background. DOS attack happens, you will get a spike in volume. You might get certain frequencies amplified because it will be a massive wave of a particular protocol. So there's things we could do with network monitoring. Sorry. I can try and stand nearer to the mic if that is that better. Okay. Network monitoring. We can use it for that. System monitoring, all sorts of other things. We're good at sound. We are good at processing sound even when we're focused visually on something else. And making use of that is important. But when we get into computer technology, we tend to get obsessed with the visual and the only sound you get is the

occasional beep. Making use of that ability to background process things could be really useful. No guarantees, but it could be. Uh, also music, you know, I thought those compositions were great. The album's going to be out later this year, right? Thank you for listening. If you do want to chat about it in any long format, the bookshop is upstairs almost straight above us. Come and see us there.

Now, there's an interesting one there because a lot of people when they're reading text, they hear a voice internally. >> Yeah, that means you are doing audio processing. >> Okay? >> Because otherwise you wouldn't hear it at all. There is very limited attention that humans can pay to things. You cannot focus on multiple things. So if you're reading text and you're hearing that voice, all of your audio focus is on your own internal voice. Those people, and they do exist, who don't have that internal monologue when they're reading text, firstly, they tend to read faster because it's an artificial limit we place on ourselves. They also don't have that absolute blockout or sound focus. Any

other >> great people taking advantage of process

once you get the right

>> it's exactly that. So going back to the submarines, uh engineers on submarines tend not to wear the ear protection and they don't wear the ear protection because they are they suffer hearing damage because of it, but they don't wear it because they become so familiar with the sound of the engine and the other systems that they can immediately tell when something's offkilter. Now they don't necessarily know what it is, but they know something's wrong. I need to pay attention to it. So exactly that it's making use of that ability to process to spot anomalies and to change your focus on something. It's not about saying you will understand everything that's happening on the network by

listening to it. It's saying you will understand when something different happens and that's when you need to go and look at it. >> Any other question? essentially but it seems like

I tested these on AI doesn't cope very well doesn't do pattern recognition as intuitively as humans and this is the thing we still don't know why that difference exists all the theory said, "Oh, we're just large language models." Turns out we're not. I don't know if that's a shock to anyone here, that there's more going on than purely looking at massive multi-dimensional information spaces and interpolating. There is something intuitive. I'm not saying it's spiritual or anything else. I'm just saying it is something we have not yet understood and replicated. I just discussing the other day. Is this partly?

Most likely yes. Now I am not a whatever they called environmental psychologist or evolutionary anything like that. But sound is a much older sense than our current version of sight. Much much older. So, if we're going to have something tuned to environmental anomalies, it's going to be the omnidirectional radar system built into our heads and not the very directional front-facing predator eyes that we have where we don't care about anomalies. We care there's food, I want to go get it. Again, just my theory.

I'm not.

>> Anyone want to get Scott? >> I don't know. No, I think Scott probably has a child in his hand at the moment, actually. >> Last time I saw him, >> I've I've heard that theory. I think it does make sense. I don't know enough about it to really argue it, but obviously we're all attuned to different things. I mean, I tend to tune out crying children, possibly because I don't have any. Yeah. Makes it much easier to ignore them or just go, "Ha, look how much trouble they're having." Whereas as a parent, I hear a crying child even if it's not my own. You can tell if it's your child as well. I think that's

almost where it comes in. You can you actually know if it's your child who's crying. And sometimes you go, "That's not mine. It's fine." Uh, and then other times you're like, "Oh, that sounds like a sore one. I'm just going to look." And so yeah, just I think it just varies from person to person as opposed to being gender balance. >> Instantly, cats have roughly the same frequency range when meowing as crying children. I've heard that that's where the kid came from, wasn't it? The only the only is it cats only around humans? >> They don't have any other thing. >> Yeah. >> What about smells? >> Right. >> Should be safe 2026. So >> I I have actually got a whole talk on

information spaces and cocktail flavors uh where I go into theory of smell and things like that. So there's somewhere between 9 and 11 fundamental smells is the current theory. Could be as high as 15, could be as high as 50, but it seems to be somewhere under 20. Encoding smells, creating artificial smells is really difficult. They're also quite volatile. Now, if someone does develop a device to produce on demand smells, then yes. But I would argue that already exists. Well, if you walk into a data center >> and you smell burning copper, you immediately know something's wrong. Thank you all. Great. Thank you.