
all right so um thank you so much for having me and I know that I'm standing between you and beer um so I realized that's bad for me uh but but I'll I'll try to get through this um I also try to go for the most buzzworthy title I think I won if you can agree uh I also tried to go for for the the longest one um you know and fairly like in the lead there unfortunately the the graphql guys beat me so have to do better next year uh if if I'm accepted obviously but yeah that's going to be one of the parameters I'm I'm sure so uh briefly about me this is me
before I started learning about AI now I have beard and and the deep depression um no I'm just kidding um I grew up always wanting to be a programmer so so I love programming I've done that my entire life I as far as I can remember at least um so I figured out I want wanted to do some education I wanted to go down that path uh and suddenly realized you know information security is way way cooler I started playing some ctfs as as Ryan has mentioned and I won some of them uh also played with people here which is also very very very cool and I started working as a information security consultant I've been doing that for like
professionally for seven eight years uh been playing ctfs for 10 or something um and uh decided uh two uh two two years and nine months ago to start my own pent test company as Ryan also mentioned now if you're coming here like what's important to mention here is I I don't have like a PhD in AI or something so if you're here and you know a ton of stuff about AI I'm sorry um I'm going to try to look at this from kind of a practical approach uh but also kind of forward-looking uh Andre touched on a lot of cool topics that I'm also getting into um but yeah I won't spoil it um the point is I drw here so I
didn't have a chance to bring it with me uh but I have a picture of it uh because this going to be fairly high level I'm a dad so yeah sorry now um the title is contains the the word offensive security and so I want to try to Define that quickly before before going ahead now back in the days uh we kind of separate between the red team and The Blue Team the red team being the people you know attacking stuff and The Blue Team being the people defending them and I know we have a wide variety or a in the in the in the audience here today so uh that was kind of the tradition and now suddenly uh red
teaming has become much more than than just being the guy who attacks or the people who attack and I think there kind of a typical bus wordy thing uh in in the in the in the information security space but uh suddenly red teaming is kind of what doesn't matter what you're doing as long as it's always red teaming doing vulnerabilities scan it's red teaming you're Port scanning something it's red teing you're I don't know writing a report it's red teing so so we're kind of moving away away from the the the word red teing because when I talk to clients they say red teaming but we mean widely U uh different things and if I choose like five people here right
now and ask like Define red teaming uh well we'll get five different answers I'm I'm certain of it um so we' shifted to kind of more adversary simulation emulation stuff but but also namely the name offensive security so offensive security is basically just attacking stuff it's penetration testing I guess it's yeah um and and and go ignore the company named offensive security they just messed it all up but yeah luckily they renamed so that's yeah that's good um as offensive security practitioners then we gain access to to a ton of data I'm not sure if everyone realizes that when we do penetration testing or or are doing a security assessment or something like that but the amount of
data we get access to is pretty large and the amount of data that we're able to process like manually for us is is very limited uh I don't know like have if if there's any penetration testers here like how many how shairs have you read like for hours just reading files trying to figure out documentation trying to figure out how stuff like hooks together in this environment uh yeah multiple hours just looking trying to find that one thing that's going to like take you further that's one part of one part of like the the data you get access to but also stuff like I don't know Blood Hound uh information from ad um information from
from yeah computers yeah all kinds of information um some some examples like Yeah from there's again there's information everywhere right and and we're not kind of uh taking advantage of it there from blow down Port scanning your C2 whatever H and also also your network um web web app scanning or stuff like that so um and then now from from the offensive security part like okay we have all this data and stuff like that how can we kind of take advantage of that well I'm going to try to briefly like describe Ai and again this is really from a practition type of view view so if if you know this stuff I'm sorry again I'm going to botch it but the
point being that that it you don't really have to be an expert to take or uh to utilize some of these tools that is is kind of um appeared the the the last year or the last periods so it's important because um people are very scared like not you guys in general not you but people in general are fairly scared of AI every time we have like a new new thing like when shat GPT arriv suddenly like it's like it's going to destroy the world or something like that but it's important for us to Define that we are still at the infancy stage of what AI can do for us uh we have these Concepts called Ani AGI and
Asi which is basically just artificial narrow intelligence and then general intelligence and then Super intelligence and narrow intelligence is basically a toddler or a kid just learning to like write or or or Draw it's a it has a very narrow range of abilities in the general intelligence type well we have a an AI who's kind of on par with with humans or like grown-ups they have all the capabilities that we have and then again we have like the SI which would be Skynet and I Robot and I I don't know we're very very far away from that we're still we're still here and there's like a ton of time uh until we're like over there but unfortunately people believe
that it's all magic and it's like it's going to take over the world but we're still here it's still fairly understandable even for me um when I when I um I I took a master's degree in information security and I was forced to do some AI stuff there and data mining and stuff like that uh and we primarily learned about supervised and unsupervised learning we're not going to talk about that today but it's it's still kind of relevant um supervised is well Machine learning in general is it takes some input like you give it a data set you give it some data U usually a ton of it uh kind of like the stuff that we already have like in
hint htin for for for the the talk but uh you give it some data and in the supervised learning if if you use that you pre-label the training sets right so take uh emails for example you have uh 10,000 good emails that you manually labeled as good and then you have 10,000 Bad Emails that you manually label as uh bad uh you train your model on that and then you give it some random U email and it'll kind of classify it that's supervising you're helping it kind of decides on on what it should do it not do that's simplifying it but yeah then you had unsupervised learning which would be I'm not going to help you
figure it out on your own very common for clustering trying to find connections between things that we're maybe not able to see as humans so it's very interesting in in kind of social graphs or or Market Market marting analysis and also recommendation engines um looking at Amazon for instance being able to say that okay with this huge Corpus of data if you buy this item you're probably interested in that item because there's there's not an Amazon employee who's like yeah if you buy this you probably want to buy that so there's no one like sitting manually labeling thousands and thousands and thousands of of items it's it's unsupervised largely unsupervised uh the we have deep
learning and that's kind of the part we're going to go into brief uh into in general in in this um in this talk um and and then reinforced learning that's again about my level but but goes more into robotics more into into training uh and rewarding a a system and then it learns from from from that for the Deep learning part um there's this new well it's not really new but it's it's fairly new uh like mainstream new uh called generative AI because traditionally uh these machine learning models haven't really produced anything like you you it's more like you you ask it a question or like you you ask it to do something and it does that thing it's
not like generating stuff from from thin air thin thin air uh but that's what's so cool about generative AI uh we've already talked about llms you probably all know about chat GPT um some of you might have used mid Journey or stable diffusion for generating cool images and we saw talk earlier where where I believe the images was produced using using um using some some type of generative Ai and also audio and we're seeing audio being used or should I say misused already for faking people's voices and that's real like that's actually something that's happening right now and it's being used SL misused by uh by actors uh threat actors and many many more there's there's probably ton more
things you can actually like generate but yeah um just go into llms because that's kind of what we're going to focus most on I'm going to simplify this very very much but it's going to you know it's it's just enough to kind of is like the duning Krueger curve like you think you know about it and then yeah yeah anyways so there are a neural network with trillions of weight weights uh a neural network again this is a simplification but it it has some it's it has some input parameters and then it has tons of hidden layers that are weighted or that that have some impact or or uh redirect or I don't have a good way to describe it
but kind of uh the data you put in uh is is transformed via going through through all these nodes and then you get an output node and it's difficult to know exactly how it arrives at the conclusion it does because this is built or this is this is trained so so you give it a ton of data it trains itself um and obviously it's not like was that 12 nodes it's it's trillions right and you're training on a lot of documents and that's kind of the if you if you kind of look at the news today uh they talk about plag plagarism uh so so people you know it steals the the information from the internet and
then and then then it um uses that in in the model and that's kind of true kind of what it does uh at least it it kind of indexes or it it uh ingests all of that information and then it builds um these types of Vector spaces and Vector spaces is very very simplified now I'm using words because uh you know these these models doesn't use words but they use numbers that represent words and those words have uh some sort of vector space that places them close to other words that relate to them so so if I read a thousand documents about an Apple for instance well it could be the app Apple computer or the iPhone it could be an
apple or apple tree apple the fruit apple or it could be an orange H and then then it tries to kind of put those in relation to each other uh depending on what's most likely now in in modern like complex llms there are multiple Vector spaces and that's what gives it context so so if we have kind of context um if for instance like you you ask chat GPT I want to know more about the Apple computer well it's going to look and try to find the vector space where it's been trained on Apple and computers more in general than Apples and fruit trees for instance so it has multiple different types of of vector spaces that it can
kind of use to create the context or the feeling that it understands what you're talking about so that's again it's that this is very simplified um but it's important because it answer answer kind of answers this common Mis misconception that LMS are just copy pasting stuff from the internet and like you know copying Snippets from from here and there and gluing them together and like brushing it over making making sure it like it's it's syntactically correct uh but that's not how it works at all it ingests every word and then it calculates like the the the the chance between every each and every word how How likely they are to actually appear together so it it has no and that's also why chat
GPT is unable to give you like the citation because it doesn't have that it's just trained its model uh to to to kind of um I'm not going to say feel because I'm not going to human ify AI but it's it's going to it's just a score like what's the most likely word to follow now all right so now the whole thing about the talk in general let's let's meet like let's put offens security and AI together and just before before I say anything about that it's important I wanted to touch on what Andre said it's it's so funny because again he had a chance to say it first um but I in my experience we're generally
late to the party when new technology arrives we're very slow at adopting automation devops uh AI uh soar or whatever like any any type of new technology uh offensive security is fairly slow and um I believe that's because the stuff we do still works there's really no need and one could look at it and say like it it's adding complex complexity or it doesn't really matter um but for us to kind of push the field forward I think it's important that we take that step step back and look at this and like is there any other way we can look at it it's was a wonderful suggestion going to other developer conferences and seeing what they're
talking about because we kind of live in our own bubble so so we need to break out of that and and try to look outside and and try yeah try to see if there's anything else we can uh we can do
um I assume that most of you already use chat GPT so this is going to be very very obvious for you uh but it's still if there's someone here who's like huh I didn't think that was possible well that's why we have the low low hanging fruits so just an example then uh what can we as as offensive operators use chat GPT for well for instance we can make it uh right command line uh commands for us uh this will be very simple I I botched the the IP address on purpose uh and and it's funny because it also tells me that I'm an idiot and that's not the valid ipv4 address um I don't like its attitude but I made sure
to comment that later um but it still gives me like the the stuff I wanted and this is you know that's in the retrospect it's a very very simple command but uh say you're writing I don't know um how many use hashcat and like know every single parameter in hashcat for MKS and for word list and for all of that I don't but yeah that might just be me but then I can use this obviously to to kind of just Kickstart whatever I'm I'm doing give me you and it's not always perfect and we'll get back to that as well but but uh shat GPT has read all the documentation they already have ingested the docum I haven't I haven't read the
man page who does that um but yeah so so they have all the information already um another thing you can do you can ask it for Tool suggestions like hey I want to scan a bunch of IP addresses how do I do that and obviously it gives you like the you shouldn't make sure that you're doing it on it could be on authoriz and stuff like that but but again it still kind of gives you the information you just have to have to wait for for it to type out the first part um in that case it it gives me uh nmap uh and Mas scan which I guess would be what I would use but uh but it also
suggests Z map and for some reason angry ip scanner for those who are going to take the certifi certified ethical hacker exam um but another cool thing that I didn't prompt it to do I didn't ask it to to give me that information but it suggests Showdown and census as alternative way uh to to to acquire that information well it's obvious obviously not really what I wanted but it could be a good way a good uh could be a good you know way to learn about something else or to like reconsider maybe I don't need to scan this IP page uh range maybe I can just look it up uh in Showdown and obviously you're not you're not going to
get all the ports because Showdown doesn't scan all the ports but maybe that's not the point so so that's very interesting again I didn't ask it to to to give me any options I wanted to scan some IPS and this is what it gave me so so as a kind of learning tool as well I think it I think it's uh it's it's it's a good idea to to use it also title is help getting an overview but also kind of system making stuff more systematic or or parsing stuff shat GPT is is is very good so this is you know it is I'm going to give it some IP addresses uh with some ports I want it
to kind of tell me what are those ports and uh and uh classify which of those assets are more likely to be vulnerable than another is there anyone who's want to take a bet on which of those IPS would be like the worst no no offense of tet here that's that's fine so um again it's going to give you and it's fairly pretty as well uh gives you an overview and it even has like Port 337 often used for lead or hacker Services sounds cool it highlights the telet part and I guess the coolest part is okay it's going to give you a vulnerability analysis and tell you about all the different types of ports
and again I didn't I didn't really ask for this but that's what it gave me and and uh and it's a good indicator that okay maybe I should look look into those uh Services uh and to kind of finish off uh the res the result it gave me well uh we consider uh the most vulnerable to be tet and FTP um and then potentially vulnerable and then probably not as vulnerable and again you know this isn't obviously isn't perfect but it might give you kind of an idea of where to look first for instance it could also you know help you write reports uh obviously um uh you got to be careful copy pasting data into
into shat GPT in my in my professional opinion uh if what shat GPT returns is something that I can stand for do I have to write all of that proess just to like say it's mine I'm not sure like people copy paste anyways from the internet and again it's going to give you a ton of information and what's also cool is that it's going to give you information like um obviously explaining what SQL ction is which which is what I initially wanted like I wanted to like have something to copy paste into a report uh but it's also saying oh well some of the malicious actions you could do could be like reading sensitive data modifying
data possibly executing commands um which I think you could use as kind of a indicator where you could say oh maybe I should go back and revisit that seal injection that I discovered and make sure to see if I can actually hit those hit those um malicious actions and see if those are possible so it's kind of way to to to make give yourself kind of a checklist to to okay do do more uh and kind of really prove that the SQL injection you found is good and obviously I wouldn't copy paste all of this into a report that would be uh too much irrelevant information but it's it's nice it's kind of a tool for for
guiding you it can also create some horrible PowerPoints um um I tried just uh just for just for fun to to generate PowerPoint based on this presentation and it gave me absolutely nothing so or it gave me this but this yeah barely usable um but what's cool about this is it gives you the the python code or shows you the python code that it used to generate the the slides um that's an add-on you have to turn turn on in in chat GPT but what's so cool then is you could take the the PowerPoint code and in theory you could create your own report to presentation python script and just you know slap on your own uh theme
and in theory you can like yeah sure I can create that presentation and then you like hit a button and you have the presentation so that's also cool automation again uh it's it's very relevant when it comes to reporting because that's a very very sad process um so again kind of back to all of that data we talked about now obviously all that data that we receive from Blood Hound or or from from whatever tool we're running we can't really paste that into chat GPT that that wouldn't be that wouldn't be like good um so what I'm going to show now is outdated or probably already outdated because this is a field that's moving very very very
fast it's it's it's running hand uh but as an example you have uh you have projects like private GPT which you can run on your own machine it requires some some decent GPU power but but um in general you can kind of create your own chat GPT and then base it on the documents or the data that you have available and it's built built with Lang chain GPT for all chroma and uh uh llama CPP and sentence Transformers and these are all building blocks that are uh all of those building blocks are always getting better so again this might not be the best option for running stuff on your own right now uh but it's again it's
moving so so fast what's so cool about this in particular uh is because it uses Lang chain it's able to ingest all these types of files so every everything from you know PL text txt files to Evernote to EPUB to markdown uh or even something as cool as Outlook messages uh just like yeah Outlook uh I was going to say it's just one more EML so so in theory you can ingest a wide variety of stuff that you find during a pentest into this and just to kind of give an idea um you're probably all familiar with mitro attack which is a what can I say a a library of of uh of tax tactics
techniques and common knowledge for for what you do during each stage of of your I guess uh offensive operation uh it contains a thousand different sub techniques that all generate more data which is which is uh interesting to would be interesting to look at and this is my uh attempt at putting all of my into one slide which always fails uh but make sure you can read that because there's a cahoot afterwards so um one tool that generates data would be uh fua or fo foca fingerprinting organizations with collected archives um often used in the reconnaissance phase uh if you're kind of it uses Google being do Dogo for searching for files uh related to the company you're looking up
so say you're looking at at covert it would try to find PDFs or or any type of of file that that uh contains the word covert or or related to covert and that's something you can could ingest easily um file shares we we already talked about file shares um Manspider is a tool which doesn't NE necessarily fit into this category or like the private GPT part as well uh but what it does is it looks through it uses regular expression and it searches through all the the files in a file share uh but what you could do instead is just take all the files from the file share and ingest them into uh private GPT and then you
can have it searchable there instead um it wouldn't be a straightforward as this you can't just like run regular expression inside of a llm that's not how it works unfortunately and we'll get kind of back to that but um that might be one way to to to um to do that type of ingestion another very cool Tool uh is is from a from a guy named uh flang um he created a tool where if you're able to compromise a Microsoft account online account it's able to in exfiltrate data uh like uh emails uh Skype not Skype sorry team messages uh calendar and stuff like that and that's again is something you could ingest like you
could ingest the chat you could ingest all the emails and then you can start prompting into that so is it as easy as just ingesting it well you have to remember that it's not like the llm knows what you're doing it's not like it knows that the stuff I'm ingesting now is um is it it doesn't know what you're going to do with it so when you prompt it uh you can't just say uh give me uh give me a password that you found in some documentation because it doesn't know what the password is probably uh it it it it's not kind of capable of that but you can probably and this is again depending on kind of the development of
these free or available llms start asking it like where's your backup server located where's your what's the like you can ask it more configuration types of all questions and that was also mentioned in a talk earlier about you know taking documentations and putting them into kind of llms uh type of systems and uh I believe that's something that's very in its infancy but it's going to be more and more relevant think about ingesting someone's confidence for instance into into a llm and being able to actually read it and just waiting for the thing to load um and then again um prompt into it um just want to mention common pitfalls there there are some things you need to
REM remember when working with these types of um tools or projects first of all obviously the stuff you put into it is also basically the stuff you get out of it so if you have any type of B bias in in your data set that's obviously going to reflect in the result you're getting so so some say you know in out and that's kind of goes for for this as well and that's I guess that's AI in general um also misinformation again back to back to the stuff that you're ingesting and this is also a com a thing that people are discussing regarding llms is it just ingests stuff from all over all over the internet
right so if you're if if you're able to like if it suddenly parses some Flat Earth or website or something it could in theory be uh make it believe or like you can end end up prompting and and saying that you the the word is flat so so and it it has no you know it has no context it doesn't know that that's necessarily wrong or right only depending on the weight so if someone publishes a ton of Flat Earth theories well in theory shat is going to start saying that the The World is Flat because it's just calculations right what's most likely uh another very important part is hallucination and uh this is something
that people really haven't been able to wrap their head around when when trying to prompt stuff especially in chat GPT it the model will try its very very very very best to give you what you what it thinks you want so if if if you ask shat GPT what is Martin's phone number it's going to give you a number it's not going to be mine that's I'm fairly sure yeah I'm fairly sure um but it's still it's still like it's like this is the most likely thing I could come up with and that's what you get and that go goes for kind of all of it so so you have to take everything it kind of gives
you with uh a large grain of salt um yeah it it it can hallucinate and that's that's a problem and that's kind of the the pitfall with people who don't don't understand that when they use shat GPT it's like it's like asking shat GPT to to figure out if a text has been written with chat GPT it's like it's it's very very Bingo and also back to kind of the the ownership of generated content we can kind of avoid that in general by ingesting stuff that we you know privately ingest stuff that we only use for for that engagement but I mean it's it's it would be interesting to take all the reports you've ever written
all the information you ever gathered and like smash them into to one model and see kind of what it gives you is this is it ethical who I'm not I'm not a lawyer so I don't know um but also again the generated content the thing is again back to it it it's not copy pasting stuff it doesn't have a database of like this came from there and and that's it doesn't have it's nothing like that so so it's it's something that needs to be considered and hopefully avoided by by going you know offline uh instead of instead of using for instance chat GPT for for stuff that it shouldn't have access to um I'm getting thirsty so thank you
for your time um yeah we do some cyber stuff if you want to contact us but yeah thank you so much
Thanks Martin we do a couple questions does anybody have questions yeah uh oh he's thirsty watch out um have you used this on engagements with customers um in general yes what are your experiences regarding as you said the the models tend to stick to the information that they know from before what's your experience with how often the model hallucinates what are your experiences with uh trying to ingest all of this random data into a model that has not been to some extent it's probably not been trained extensively on the new information yeah how it's not it's still in its infancy that's it's it's it's not where I want it to be and and it's easy
to kind of dream up scenarios where you can like ask you to do whatever you want like give me all the information about every Windows host or whatever it's not there yet um that could be fixable with with with restructuring the data before ingesting it uh better language models uh it's yeah it's it's still still a work in progress um what it's decent at is is taking uh emails and and kind of not necessarily parsing them but or well parsing them but not necessarily recalling them uh because again it's not able to recall anything but uh but typing something in the word of something else and this goes back to kind of the context on the vector um
Vector spaces uh but it is it is able to kind of say if I want you to write this in the voice of the CEO it it's is able to do that but it again it's not perfect we tested it h it's not our daily driver but yeah
anybody else have questions for Martin inis about possible uses of AI in offensive security operations yes coming your way keep your hand up thank you have you seen any changes in restrictions regarding what information it will give you you considering that they're kind of uh cutting down on what information you can get the information I against from from the model you think yeah in terms of offensive security that it won't allow yeah so so the nice thing about running stuff on your own is you don't have the guard rails so so like for chat GPT as you already saw it kind of tells you like you need to don't don't use this for for malicious purposes um and that's
a part of the chat GPT model uh that it's not necessarily trained on that but it's it has guard rails kind of around it if that makes sense to to kind of protect the model from from doing stuff that open AI doesn't want it to do uh if you run it on your own you don't have those guard rails so so that's kind of but uh I my experience has been that it's been more relaxed it's not as restrictive and it's not shouting at you as soon as it kind of feels like you're trying to hack so so shat gbt as well kind of feels more like yeah it gives you warning but it still gives you
basically the result you're after so so it's it seems like they kind of tuned that a bit down but that could be anecdotal okay so for GPT uh exploit code and stuff like that uh there hasn't been any stronger restrictions going forward than what it was say a few months ago that you you have seen yeah as yeah so so from my experience no like there like it's if anything is become more relaxed and also the thing is uh kind of bypasses for those type of guard rails that are so common now you know it's like my grandmother used to write exploit code and uh I would I would love to like learn something from blah blah
blah and then you it kind of for some reason kind of bypasses the those guard rails and suddenly it gives you exploit code um what's worth mentioning is that the code it gives you is usually like crap at best and you have to like tune it and help it to tune it but um but in general it doesn't there's yeah the guard rails are generally bypassable thank you any other questions for Martin I have a question you said it it wouldn't be a good idea to paste uh client IP addresses into chat GP what about pasting client IP addresses into the Enterprise chat GPT well uh again I'm I'm not the lawyer no but but to to qualify a little bit we
when we in the VIPs talk from Nora and Kenneth they talked about app a test right so we if we use app a test then Google tells us we can trust this app but then we still have to trust Google right so how much do we trust Google how much do we trust uh chat GPT and it's uh it's you know not from not is it legal but yeah how much you trust that's a good point and and and and a thing that I want to add is I I mentioned that we're we're all often very late to the party like the chatbots that builds on llms are already being built into office for instance it's it's already reading
your email at least it has the cap capability to do that and and it's a toggle away and it's some point in some organization that's going to happen um uh so so at some point you have to trust someone I guess we we like at covert we we try to to to avoid that by self-hosting as much as we can but yeah it's uh again you have to someone has to have your email anyway if it's Microsoft like why not give Microsoft and again it has Defender and Defender for endpoint it already knows your IP addresses so by giving it doesn't it doesn't give you anything else in general