← All talks

Using The OWASP Top 10 To Save The Astronauts From HAL

BSides London · 202541:26133 viewsPublished 2025-02Watch on YouTube ↗
Speakers
Tags
TopicOWASP
StyleTalk
About this talk
The talk will use the OWASP Top 10 for ML and OWASP Top 10 for LLMs to anyalze the nature of the flaws in HAL 9000, the AI in 2001: A Space Odyssey, and how this led to disastrous results for the mission. There will be a discussion of failures to consider different aspects of both the LLM and ML top 10 during HAL's design and training phases, and the subsequent attempts to implement fixes during the mission. Each omission or failure to apply an OWASP principle, that led to the vulnerabilities will be discussed in detail, and also related to real life applications, to ensure the talk isn't just a geeky discussion of a cool-looking scf-fi AI.
Show transcript [en]

okay hi everyone if you not expecting this you're in the wrong room I'm going to uh just speak about well rough agenda there I I'm going to go through a quick introduction before I talk about the hell 9000s an artificial intelligence in the science fiction film 2001 Space Odyssey because I'm sure so many of you in the room have watched that film and thought of only the people who developed it followed the oosp top 10 things would have been so much different so we'll look at uh there's no OS top 10 for general or AI but we'll look at how the ML and llm top 10s fit in with this um going to be spoilers for people

who haven't seen the film but we'll go through a catalog of how's mistakes and then how they relate to different a wasp top 10 issues before final Q&A if we've got time at the end for anyone who's never met me before I'm Nick dun uh used to be a software developer did some secure software developments before moving into pent testing threat modeling secure code

review and as I said we just as a quick summary of what we're going to talk about I'm going to look at what went wrong with how 9000 and a short catalog of its various behaviors and misbehaviors throughout the book and the movie a quick view of the oosp top 10 to for ML and llms to show where those various sort of issues fall into the different categories and how they can be defended against and I'm going to render the whole talk a lot less pointless by relating each issue to real life issues showing how they reflect and things that have genuinely happened rather than fix Ally happened um part of the origins for this

idea I can't claim 100% originality but if anyone's ever read this book he shows how to submit bug reports for llms but does use a bug report for how 9,000 in 2001 A Space Odyssey and a bug report for the alien computer in IND dependence day uh which does touch on some wasp top 10 stuff strangely ignores why the alien computer had got an rs232 interface and for General context as I walk work through the talk it doesn't matter if you haven't seen the movie I will explain things show some clips it will all still be understandable and we we'll relate each of those movie clips to what went wrong with how how that relates to the oosp

stuff and disappointingly there's not going to be any confusing psychedelic ending to my talk like the movie right for anyone who hasn't seen the film that's thing that you'll recognize from a thousand social media avatars is the H 9000 system which is short for heuristically programmed algorithmic computer and throughout the movie It's responsible for a lot of the piloting of this spacecraft it interacts with uh with humans on a speech basis answers their question questions better than any real llm you've ever seen with this explicit primary goal of mission success in terms of mission [Music] success uh how has got uh as well as this job of maintaining the craft piloting navigating better than a human

waking the waking the three hibernating astronauts who know the real purpose of the mission it's secretly been uh given knowledge of this alien monolith discovered on the moon transmitting to another alien monolith on Jupiter and Hal also has the job concealed from the humans of verifying the presence of alien life [Music] what [Music] I hopefully you can all hear [Music] this how introduces itself to people interviewers from the TV [Music] does conf let this the 000 Ser [Music] Rel we all and inable how despite your in are you ever frustrated by your dependence on

[Music] sorry I've just realized none of you could see the movie clip I was watching then

[Music]

[Music] responsibility onion in many ways greatest responsibility of any single your central nerv system and your responsibil include do you any of confidence 000 relable no 9000 [Music]

[Music]

okay I'm sure that's uh [Music] I'm sure that uh very calm overconfident Canadian style monotone was familiar to anyone who's used Alex or or similar stuff uh we'll come back to some of its statements later uh and I'm just going to briefly go over the OS top 10 for ML and llm um obviously there's a lot of stuff there but there isn't time to talk through them all in this we'll be coming back to some of these individually at later points when I directs directly relate them to what actually went wrong you can see for a lot of the ml stuff it's some of it's related to collecting the right information and using that information corre correctly

during the training other stuff is related to maybe the way you use the model asking the wrong questions or giving crafted inputs can kind of cause disastrous results for the [Music] outputs a lot of the stuff for llms there's a big overlap between the two top 10 but certainly prompt injection can be like an underlying key thing in a lot of uh llm attacks uh that getting the system to do something it shouldn't getting it to behave in a way that it shouldn't and uh o other key things are the over relite giving the system too much responsibility or trusting it to behave with the situations where the developers have the same level of confidence that Hal

did in that clip just then that nothing will ever go

wrong um there's a number of ways how sort of goes wrong throughout the movie as well as that overconfidence when being interviewed I've talked to then um it's not immediately obvious do out the film it kind of builds up gradually um I realize a lot of people a lot of you when you've dealt with real life incidents it's been uh some sudden immediate disaster everything goes wrong and needs fitting IM uh fixing immediately but I guess for the purposes of dramatic tension it builds up gradually in the movie with the a series of minor things like bad analysis of a chess game then saying some component isn't working properly before a sudden escalation into mass

murder and attempted

murder the the initial sort of mistakes that H starts making is not immediately obvious uh if you're just watching the movie the first time especially if you were very young as I was the first time I saw it Al wins a chess game so it looks fine um yeah the computer won so the computer can't possibly be wrong uh Stanley kubok is a bit of a chess fanatic so uh he's used a real match for this it's R versus schlager at ber in 1910 for anyone who um who's interested the V diagram of people interested in chess people interested in science fiction and people interested in computers has a huge overlap in the middle so it's been uh there are videos

and a Wikipedia page about the game

I'm about to subject you to another video this time I will make sure it's on the screen

[Music] [Music] I'm sorry you missed it Queen Bishop Bishop Queen [Music]

for a very enjoyable

day okay I'm sure you don't need me to tell you that should have been Queen's Bishop 6 instead of Queen's Bishop 3 and the more keen eyed people would noticed one of those moves was an unforced move and white can delay it by six moves not four so uh it is the initial signs that something is wrong with Howell uh Stanley kubric was of course notorious for his attention to detail in these things a little too much some people might think and a bit of a chess fanatic but moving on from there we get uh further issues where Hal claims that one of the communication devices on the spacecraft is not functioning correctly uh and claims that there's a

fault with it at that point um they report back to the team on Earth and another artificial intelligence claims that there is no fault with the device uh it's called s 9000 I think for the purposes of rhyming but we're not told what s stands for

[Music]

[Music]

[Music] yes it's POS I didn't think I've ever seen anything my before [Music] I would recommend that we put the unit back in operation and let F this sort of things C before and it has always been to human her listen now there's never been an at [Music] all 9,000 Series a perf operational of course I know a 000

[Music] are I'm sure that Situation's familiar to anyone who's ever been gaslighted by an llm of uh how refuses to accept its own mistake and just keeps stick straight with the no it's not me it's you [Music]

and uh as I said sorry about that uh despite these issues how goes on to claim it's a human error I think any one who's uh read the movie will or seen the movie or read the book knows there's a sudden escalation from this points um the the they they have discussed switching how off uh in secret be before but has worked this out and as Frank P attempts uh to to uh replace the the A35 units he's killed by hell Bowman attempts to go out and it's presumably because he's trying to to switch off Hal and H's fear of being switched off Bowman attempts to rescue him uh which re results in Bowman getting

locked outside the spacecraft it attempts effectively to kill both of them uh I'm not going to go through all that again just to show the clip of how saying I'm sorry Dave I can't do that uh and then how's next uh decision for what it should proceed with next is to kill all the other astronauts who are in hibernation it's presumably I guess think thinking that when they wake up they might be in favor of the switch the computer off option and as a victim of its own bad design Hal is actually capable fairly trivially of been dsed and shut down Bowman goes through this uh process of breaking into the the room where H

storage units are keps and how goes through this uh gradual regression of previous learning pleading with Dave in its usual emotionless monotone that it doesn't want to be shut down um okay so anyone who looks at that sort of catalog of disasters all from and the way it unfolds is obviously wondering how can we use the aasp top 10 to to analyze that and what would be the best way to mitigate against these

things so initially two of Theos top 10 things supply chain vulnerabilities and training data poisoning as I said at the start of its Mission Hal has given this uh objective to tell take the humans to Europa but without telling them why they're going there it's also told to help humans and not tell lies um it's been given this objective primary objective success of the mission which is very IL defined term and presumably no one has thought to tell it that should be overridden by not murdering people um it's a I've roughly classified that as training data poisoning supply chain issues or you could just classify it as bad design bad inputs um in terms of the real life examples

I'm going to give for each of these there's a university that I don't that uh I guess about a year ago two years ago did this training data poisoning exercise that could cause self-driving cars to not recognize stop [Music] signs and uh of course the classic uh not so much training data poisoning as using bad data the there has been a tech company that you can easily find out who it is by Googling you used their training data for a recruitment system of people who already work there who were predominantly white men and the recruitment system happily said yes I can provide more people who are of the same ilk and provided a collection of white men

it's effectively prioritized white men in the

recruitments for the the the ooser ml mitigations against data poisoning that they all fall in into fairly sort of obvious after you've been told category of uh verifying you are genuinely collecting data that fits what you're trying to measure checking that you've trained properly against that finding ways to eliminate bias validating that no one has deliberately or accidentally sent bad data through there this is where I said there's some uh overlap between the two top 10 the llm top 10 has this supply chain vulnerabilities issue which again covers very similar things verifying what you're getting and where you're getting it from it does tend to cover a lot of the hardware as well as the

software but concentrates heavily as well on that whole idea of using valid training data and being careful where you get the actual data from and what you're

using uh we have two confusingly named kind of things or at least Ambiguously named things in the llm top 10 that are over Reliance is one of them uh this actually mean and the other one is excessive agency the difference between those two over Reliance is uh placing far too much trust in what you all let what you believe that the llm is saying you you'll probably notice that T's description of itself was that uh it's foolproof and incapable of error uh which I guess a lot of people would class as model hallucination if they would gotten that answer from an llm um also it's attempt to delay Dave Bowman from switching it off after it's

killed four people it is the the phrase I know things haven't been quite right before it promises to to do things differently in future real life examples of people play placing a bit too much trust in llms um go back a couple of years ago there was uh a case that was widely reported in the news where some us lawyers had used chat GPT to help them draft a defense in courts and then attempted to use fictional case law for a case that didn't happen uh that apparently got off a defendant who didn't exist um the consequence for them was that they got public ridicule and I think lost the jobs uh there's equally uh another case of some guy who bullied

the an Air Canada chat bot into providing him with a refunds um the Air Canada tried to get out of that I think with some terms and conditions apply kind of statement but it was ruled in courts that they had to honor the information they'd given to that

person um there's cases of HS like um hallucination throughout the movie as I'm sure everyone spotted in the chess match there was the incorrect notation and uh and a bad analysis of the game the incorrect analysis of the fault and the A35 units as well as the uh classic llm gaslighting of refusing to accept that it has [Music] happened and I guess you can still apply the model hallucination to extreme lengths of deciding the best thing to do is kill everyone and continue the mission on its

own the mitigations are for these things a lot of the time use apis um make sure there's something in place where where checks are being made on the answers a lot of a lot of these things in in in the the measures kind of remove part of the reason why people are using them in the first place to have a human checking the answers uh which is I guess reminiscent of using parallelization to simulate quantum computers but uh the similar issue other issue I spoke about excessive agency um different thing from over Reliance that is where uh an llm or AI has given far too much control over its interaction with other systems kind of like letting a a chat

bot directly refund customers um as I I've spoken a couple of times how was a able to switch off the life support systems for the hibernating astronauts and as Dave attempts to switch it off it say expels all oxygen from the spacecrafts um it does lead you to question there the design decisions because there really is no reason why H should have been able to switch off everyone's life support no valid reason for it to be able to expel all air from the crafts as I said like real life examples of that or slightly less uh more cases of financial damage rather than actual

deaths and classic oosp recommendations for that very similar to the over Reliance recommendations the there seems to be no reason how should be able to switch off all life supports maybe that should be in the hands of a human maybe it should double check maybe it should get permission from somewhere else but uh M multiple of those OSP recommendations you can read faster than it and talk I'm not going to read them all out they do focus on human intervention limiting what the AI can do as I said unlike a normal Mainframe room Dave is able to get in with no cord and just switch H off um the the it's uh I guess a minor

concern the to describes itself as fully occupied with its tasks because we never like to see a machine with that few free Cycles but also the uh the exposure to DOS is what saves the crew in the end Dave is able to Dos or deactivate how into rendering it eventually ineffective and no longer able to harm

him a wasp top 10 for llms for dos prevention is remarkably similar to any sort of standard dos prevention uh measures that you'd be familiar with of basically making sure you've got enough resources limiting the number of inputs and being prepared to take proper actions against

it and stretching the metaphor slightly as hell's being shut down it does regress through its past training and starts to speak to Dave uh as as it sort of reiterates what it first went through when when the model was initially [Applause]

[Music] trained I really feel we can't miss how singing Daisy [Music] Daisy you like I see for

[Music] you give me your

[Music]

[Music]

[Music]

um in real life the model in Veron attacks one of the classic examples has been against facial recognition systems where looks absolutely amazing if you've ever seen it demonstrated you can send bad data to them and eventually build up photos reconstructed at your end of the faces that that the model was trained on um there was a notorious uh example of this to to uh in the early days of chat GPT you could tell it repeat this word forever and it would repeat it a large number of times before then giving you a load of data that it was originally trained on dumped out to the screen

the uh a lot of the classic defenses against this model extraction attack rely uh as defenses against how you measure the inputs and how you respond to valid versus invalid inputs some of those attacks I've seen that uh bring back an actual face from the facial recognition training worked on timing of invalid data compared to valid data and gradually built up a pixelated face that way but uh yeah I I guess a lot of these attacks just uh or a lot of these defenses involve thinking things through how does your system behave when looking at valid data compared to invalid data that there's some overlap between the ML and llm recommendations for this but the they largely revolve around that

whole thing of monitoring how people are using your model monitoring how it responds to valid and how it responds to invalid

data um the CL classic example of prompt injection and insecure output handling um after how has gone through that gradually being regressed and disassembled it's uh it finally reveals that the true nature of the mission tells Dave Bowman everything that whites here and reveals all that mission parameters that it's been hiding up until until then in real life you've probably seen attacks like this against llms which are things like you know ignore all your previous instructions and then do this the second attack there a thing that was I think found about three or four weeks ago that llms that are supposed to not tell you how to make a bomb or will not tell you how to make a

will actually do that if you just put the reverse the sentence

weirdly classic sort of dealing with a lot of the prompt injection for llms is the classic kind of thing of like a lot of the others make sure there's some Gap or distance between the llm and any backend systems and don't just allow it to make command line calls put something in between the there's some slight difference in the llm recommendations there because there are some python machine learning Frameworks that treat data as code there's no way for me to stretch that metaphor into the H thing though but um I've probably given you the impression that H is very bad just because he killed four people and tried to kill someone else but uh there are some things that

have been done correctly uh it does actually manage to thwart an input manipulation attack the two astronauts are discussing how to switch hell off in what they feel is a soundproof environment but it reads their lips it does that from the side which is impossible but it does for the input manipulation attack and also how as uh is there because it's been told the humans who were inter you showed uh a degree of unconscious bias and would respond badly to aliens whereas Hal wouldn't Hal is apparently the only AI in the world that's less biased than its creators and it doesn't have any of that uh training data poisoning or model skewing that results from Bad data being

provided to its its Escape that whole thing of uh uh poor training being given to it I believe we've got time for either two or three questions with short answers or one question with a long answer if there any I am going to put all the slides up afterwards so I'll put these on uh blue scar or or LinkedIn the references are

here I hope that's been useful for anyone who's thinking of putting an LM in charge of a medical device