Navigating the AI Frontier: Investing in AI in the Evolving Cyber Landscape

Name: Navigating the AI Frontier: Investing in AI in the Evolving Cyber Landscape
Uploaded: 2024-07-09
Duration: 42 min 55 s
Description: Navigating the AI Frontier: Investing in AI in the Evolving Cyber Landscape Chenxi Wang In her keynote, Chenxi explores the intricacies of the CybersecurityxAI landscape from an investors’ lens, highlighting prime opportunities for automation, ethical considerations, and risks in this rapidly-evol

BSidesSF42:551.0K viewsPublished 2024-07Watch on YouTube ↗

Speakers

Chenxi Wang

Tags

StyleKeynote

About this talk

Navigating the AI Frontier: Investing in AI in the Evolving Cyber Landscape Chenxi Wang In her keynote, Chenxi explores the intricacies of the CybersecurityxAI landscape from an investors’ lens, highlighting prime opportunities for automation, ethical considerations, and risks in this rapidly-evolving market. She will dive into real-world adoption of AI across security domains, the importance of responsible AI deployment, risk management strategies & tips on innovation in this space. https://bsidessf2024.sched.com/event/196561150b04eabcd9710eef31b9acfc

Show transcript [en]

now more than enough rambling for me I'm I'm sure you're tired of it so time for the real content to begin um our F first keynote uh uh this weekend is Dr chiny Wang speaking about navigating the AI Frontier investing in AI in the evolving cyber landscape without further Ado May the 4th be with you and welcome to the stage Dr Wang hope this works okay thank you thank you everyone for coming this is awesome bides is one of my favorite conferences um Jack Daniel would be proud to see what it is today from an idea he had many years ago uh he's an old friend of mine so I'm very happy to be keynoting here today um I'm going to

be talking about Ai and cyber I'm a cyber person um I'm an investor now but I used to be a technologist so today I stradle between uh business world and Technology world so my talk isn't as technical as some of those that you'll see here but it'll give you an interesting perspective coming from someone who look at technology from investment and from a money-making standpoint that could be interesting to you all right a little bit about myself um you can tell this is AI generated you know there the the text all mangled so I was asking someone the other day I like when are they going to work on better text um but see even just that part will

take a lot of programming effort to make it happen uh so I am a computer scientist I have a PhD in computer science started as an academic teaching computer science in KY melon as my first job and then I came to Industry and did a number of things product strategy market research um Executives at large and small companies and now I'm an investor so I think I have lot of different perspectives on why a piece of technology work or doesn't work and that's what I'm going to talk about today um these are some of the interesting statistics that you probably already know it only took chat gbd 5 days to reach a million users uh the

same number of users took Instagram two month Spotify five month Facebook 10mth and Netflix 3.5 years so it's very very different game we're talking about here um I looked at some stats on how many companies are using large language models at least experimenting and the estimate is that 200 million companies around the world so that's not a small number okay does that work and this is from um uh the latest AI report from uh data bricks uh they talk about what kind of things Enterprises are doing with AI and it's it's kind of iar you can't really read very well but if you direct your attention to over there there's detecting security threats in the count takeover attempts

that's kind of fraud related uh and analyzing communication for fishing attempts um in my world you know I get lots of early stage company pictures and I see a lot more than that in terms of AI applying for security as well as securing AI so we'll talk a little bit more about that um but a very interesting question to ask is um there's been a lot of money spent on what on AI infrastructure Nvidia stocks keep going up because a ton of companies buying their chipset um but an interesting question to ask is if you build infrastructure if you build the bottom layer infrastructure you should have the layers on top at some point and the

layers on top that will make sense that will be what a market looks like today we're seeing lots of infrastructure build out some data layer build out some application layer built out but the usage of application is still small even though 200 million companies are experimenting with language models they're not yet in the critical business process um so what does that mean there was lots of interesting talks about Golden Age of AI you know AI has been around for a long time but really since November 2023 is where it became a huge boom um we're not quite at the Golden Age inferences and training are still expensive now this is one thing that a lot of people building AI products don't

know when they build a product in their Labs or in their you know VPC environment on AWS um they're experimenting with it right so training on sort of a small set of data is fine and they doing their testing is fine but the minute they take it to a real production environment in customers environment the number one thing that both the customer and the product team um are surprised about is how expensive it is to do inferences and many people walk away say hey Enterprises users walk away and said oh it's kind of fun to to see something interesting generated by AI but at the the end of the day the added benefit that it provides me is not yet worth the

cost that I will put in for infan cost and run the models and all the other privacy security concerns so that says to me that we're not yet in the pervasive deployment stage we're in the experimental stage something will have to change in the cost equation for this to be pervasively deployed so think about all these companies are building AI agents here AI agents uh here this year uh winter YC batch has 170 AI companies 170 out of I think 200 companies so um everybody's doing AI many will not survive because cost uh they can't get a hold of GPU Cycles or what they're doing is a thin layer on top of large language models that

everybody else can do so we are still learning to walk um some of the things I've been noodling about right so I have a 14-year-old my son uh my husband and I are both phds in computer science my son couldn't be less interested in programming and he's always been that way and since he was like 10 we were like oh how about we teach you python how about we teach you you know whatever language we can do it we're like do phds computer science he's like yeah no thank you um and my husband and I were always be like oh you know what's going on and he's he's most interesting fishing to be honest um yeah and I even tried to have

him write a fishing programs like you know do a histogram on the number of fish he called each day and stuff and he's like my interested but he really could not be bothered with the the specifics of programming like learning syntax is like just he's like so bored um and and I've been always been like oh you need to you need to but recently I'm like I'm not sure he needs maybe he doesn't right why because I think software engineering is going through a huge transformational change it's not the software engineering that I know when I was learning programming I started Pascal I don't know if any of you learned Pascal right uh so went from

Pascal to um to C C++ Java and Python and all that I tell you lot of people use python python is a shitty language but that's okay um so software engineering is turning into model engineering because AI engineering will subsume parts of the mechanics of software engineering now what does um model development model engineering mean um some of the mechanics I said the low-level mechanics will get subsumed by um code generation capabilities that you know everybody uses but the telling it what to do testing the model and really understand how the uh the mechanics of cost of running the model as an application those things will still be very in very much manual and very much

done by engineering teams but it's not about you know do I have two equal signs versus one equal sign one's assignment the other one's equality test those things will be done by AI so we're not we should not be teaching our kids or Young Folks software engineering the same way we should be teaching them how to leverage a model a set of functionalities that's already built kind of like open source today but not the same I'll tell you why um and learning what is important for their application to be manifested in this AI stack um the second thing that I've been I I've been realizing um as I do this kind of no we do work all the time when we get into a

new investment area the work meaning that understand the trends of the market as well as understanding the fundamental building blocks of technology and look at what the gaps are and look at business opportunities um one of the things that it's very uh apparent to us is security um Enterprise security and consumer security if you will take on really take on New Dimensions and what are those Dimensions um security is a lot of times is about um making a decision on the set of data right is this thing malicious is this thing suspicious is this thing good um and in the past is really about the game of you know who has the relevant data who has the right data at

the right time so you can make a better decision than the attackers or maybe the attackers will make a better decision than you because they did their reconnaissance well um in this world uh in the AI world what we are seeing is the so of the the the data inequality between Defenders and attackers are kind of Disappearing uh I mean you still have inside the um Enterprise inside your your company you still have more data that other people don't know about but but the window of that being true is kind of diminishing um I got a a demo a toy demo still today uh but it's very uh as astounding to me so this person was able to uh set up a stage

environment where he can pull information from you know certain infrastructure components and security products and then ask or asked the question that um what is the biggest Gap in the security architecture overall and the um know AI will answer for you and um he's able to demonstrate that uh some of the most interesting security data he can like easily convert into flat Text data files and then generate embeddings put into Vector database and then have uh just a lightly modified third party model to turn through this so what what that means is lots of Point products that going forward may disappear because they are operating on oh I I've got this set of data and I have my proprietary

analysis algorithm I can tell you certain things about the set of data that you don't know um without my analysis but now a fairly generic th third party language model can do that for you as long as you can transform the data into the format it can consume uh so that's very interesting but also think about doing that from external standpoint right so I have an example later um you could potentially get to a point ask the question does Bank of America use product X and does Bank of America use product X in such a way and when you get to that kind of information then you might be able to direct your attacks much more in a

targeted way in a malicious way um so and there's also the uh the data economy is um taking on a also a very different set of Economics with um AI now so if a a underground criminal group wants to train dat train their own attack language model right so there's ways to get data it's different than in the past buying credit card data buying identity data right now it is really buying a set of data that can really direct their model to make um interesting decisions so um we already have seen in the underground economy where data Brokers are selling buying selling different things that's not necessarily for generating an attack package per se but it is for training

models fine-tuning models so we we expect to see that more and more going forward um so this is a an this is a toy example but I have seen this uh in a in a demo which I kind of was like o um so this is a uh a cve that it's a real cve that uh uh applies to uh this version of umuntu and there is a wild uh there is an exploit in the wild can take um advantage of that vulnerability now let's say you know that a particular product why protects against that exploit right lots of security products once the exploit is out it can detect it and whatnot and we know that uh this organization

that doesn't use why or this organization uses a product X that only works in the environment why doesn't exist right and once we know that kind of information and and tell you who uses what product is actually pretty easy to get these days and I'm not I would not be surprised if the large language model uh are able to get get their hands on those data right so you very easily can ask a question and know that you could unleash this exploit on this company um I don't know how many of you have maybe you guys all have have you asked Chad GPT about yourself yeah yeah yeah you have right yeah yeah but the back there yeah so um

I've asked very specific questions about myself that are not necessarily W Wily known and it knows it's kind of scary so next time try something try something obscure about yourself and but you know I'm I'm out there so maybe a little different but it it knows a lot of things that that we don't know and you could imagine a attack brain that is kind of similar to chat GPT I mean there is one out there that that the um underground economy uses but I'm talking about one that is as smart as GPT 5 that's being trained by nation states and that is being worked on so as Defenders um our jobs are getting more and more

interesting and difficult um okay so this is what I was getting to is um in the past some of the security defense relies on us having proprietary knowledge about our environments and having more data than attackers and that Gap might go away and then what happens um also specifically in terms of data security and application security the requirements are also changing right so I actually have a application security background um I wrote a compiler for my uh PhD thesis it's an officiation compiler and I still remember sitting there so many hours coding to get the compiler to work building compiler is not for the faint of heart um so aback is near dear to my heart and I've done a

lot of work in absite including sitting on the board of oasp and now I look at what appsc is going forward it's very different game so tradition let's talk about data security for a second traditionally data security is about you know data Discovery classification access management and and maybe permission management so that we can assure that the right data is accessed by right people right so all these things you know um but today actually data the boundary of data goes into things that you may not necessarily care about in the past it's not your own data sitting in your data store database it is the data that goes into train the AI model is part of the data you're using

it is the data that going to fine-tune the AI model and all that how do you test the right data is being used how do you test that the data is not biased does not have malicious uh intent all these now are part of data security slash application security domain and as far as I know we don't have a good handle on Discovery on data Discovery there data classification there as well as testing of the different categories of data right so I would love to have AB test like same model right so same like vanilla model and I train with a set of data without this this particular set or train with the data with this particular

set with a control test to see what the difference is for uh the two models um I haven't seen a lot of research work there but it would be really interesting thing to see sensitivity test of models on data and prediction test on data it's all very interesting um so absc really becomes no I say model secc um application testing is model testing pen testing becomes R teaming uh against data actually in 2010 I did a keynote for OAS Global conference and there I talk about how uh we should be gearing up for red teaming and absc testing of AI models and only took more than 10 years for for this to be a reality um so some of the uh specific

things about ABAC this is really small I can't read with my old eyes so I have to see um so in the past uh you know absc static analysis Dynamic analysis penetration testing all that stuff you guys do and you know really well um but now we are looking at a set of new added things right so um what data goes into training the model right do we have a a handle on the different sets of data are they uh like for instance if you want to train a model to recognize um architecture structure in a set of images right you know you you can give a different image different architecture style different buildings that's all

great but have you fed data that's a little bit darker darker images versus bright image have you uh fed images that are a little bit distorted and images test uh taken in the really bright reflective light all that things that you have to understand the same with security right um and model supply chain how many models are chained together in this model right there's some sometimes the open- source model actually uses other model on the back end um test against specific samples specific Cuts specific requirements you have for your understanding and for your use cases and do you have the right test cases right exploits against it um and know red teaming on prompt injections

the robustness against prompt injections and a serial data training test all those things should go into a robust Enterprise product but it is not today right um in fact I know many of the Enterprise products are including security products are training with the uh LM on the back end so they will come they actually come with an LM uh either in the cloud or you know in in your own infrastructure and it's it may not be exposed to the customer but it's doing some work on the back end uh to help you do things right and sometimes you don't even know it comes with it and what happens when your data is in this L that you don't know about

and and it's not really tested the same degree in robustness as the application code um and it impacts devops as well right so is there a way to like in increase the increase the cuz I can't read after like always turning around um there's plus sign okay now of course the cursor doesn't work well anyway we'll just wing it okay um oh okay thank you so in devops

okay by okay so in um in the old devops world so I played in container security for quite a bit so that figure a is very famous there right so we went from monolithic release uh monolithic Cay to to more The Continuous release versus once a year release so you m going from like big co-based to microservices and you deploying in a different way as well uh microservices is really interesting because it has changed the way we test programs we deploy programs and the way we kind of have identities percolated throughout the system right when you have um AI models all over the place right so think about in the in in the future what you have is instead of

microservices you have Micro models that's talking to each other through apis through some kind of data exchange Channel may or may not exist today um so one model may be drifting or maybe hallucinating in that whole architecture what's going to happen for the collective decision making that is done by those Collective microservices talking to each other making a decision we don't know I mean this is all very very early stage work but it will happen so in the future mark my words microservices are micro models talking to each other so devops are going to be very different you're deploying a new model in this architecture and leave the other models alone and you better be

robust enough that that the uh the whole Arch the whole application the whole model chain is still making the right decisions after you continuously update and deploying those models individually um the third thing that's interesting to think about is flipping a little bit is AI model as a Threat Vector right so we are all security people so we think about introducing new technology components into your environment could be a Potential Threat Vector um and this is also I think is I did not put the the citation here it's my fault I think it's found data bricks um but you can see that the number of large language models being deployed um sort of the um chat

GPT there um and launch this is the the red line is the um third party LM apis are going up right and so it there's a prompt in like people using it on the um command line there's also apis apis indicate sort of serious usage right because you're building on top of it as opposed to me just asking what's the best restaurant in San Francisco right so if we have a lot of people are using those third party LS as a um uh you know universally deployed cbase and if if that particular LM has a security vulnerability you could take many organizations down just like you know we talked about monoculture in the in the in the past

and right now we seeing some of the very popular ones popping up on on top right open AI anthropic and llama llama's very widely deployed in llama too so um who is looking after security for those models um think about solar wind we could have a solar wind of models so right and what will happen to the industry and all the applications that build on top uh so these are kind of the new things that we've been thinking about I want to bring it out to just have a discussion here to show you that that's some of the um new issues and concerns that not only are impacting our industry also as investor I look at this

I'm say oh that's an interesting questions whether we um I look at from two ways one is Market TRS the other one is is there Innovation opportunity for new product or new uh piece of capability coming Town um and we talked a lot about this and this is slightly different uh point is open source right so we know many of us experiment with open source software now with open source model now is an open- Source LM the same way that is open source software no it's not so they may give you the code right but if they don't give you the data that it's train on it's not 100% open source they don't give you the weights in the model it's

not 100% open source so they may say hey take this open source model but all those important factors important uh uh factors are are obscure to you you may not know the same way about this model as you would of Open Source software so that's another thing to think keep in mind and and the last thing that I'm really interested about is AI for security as an opportunity right so earlier I talked about I saw a demo where they bring in you know xdr data they bring in email data they bring in I think was some toy Sim data and and they all transform into flat data uh text file and fed through a um uh sort of a a framework to do uh

indexing and embedding generated put into Vector database and then um is using L against that and is able to generate actually pretty interesting questions about you know how should what's the most uh big biggest largest gaping hole in the security architecture it will tell you uh it's just a it's still a toy example today but I think with the um know Improvement of Technology we could see that as as an opportunity um so AI for security being able to process large volume of data in the way that's targeted for a certain task uh is happening right so we see things like um automated root cause analysis which is you know taking in data and look at uh what are the root

causes for other issues in the past and using that to train a model and answering root cause analysis using similar ilarity search and um data embedding to answer those questions um the accuracy is not quite as high as I would like to see as a production product but it's getting there in the past like four or five months I've seen some of the work that's already the accuracy is improving quite a bit um you know obviously lots of boosting productivity usage of AI applications um so I've been collecting Enterprise deployment of AI models and asking them what are you doing right not just in security but in general just understand the the usage of it um there's a large

insurance company potentially number one number two in us um every you guys might those in the US would know there's an open enrollment period for health insurance right it's it's typically starts November so this company uh in November starting November 1 they'll hire 25,000 temp workers and all they do sitting on the phone taking uh questions from subscribers their healthc care policy subscribers calling in to say oh does this policy cover this and how much do I have to pay out of pocket if I subscribe to that policy right so and that people like me don't call but a lot of people call um and when you call somebody has to like look through a ton

of documents to give you the answer so uh and this company this health care company health insurance company has been collecting data about the performance of human agents right so many years so the accuracy of human answering question they have tens of thousands documents right some of may be less accessed than other but the human agent accuracy is 66.7% and then they've been like adding on AI agents uh just in the mix to see what the AI agent performances are and the AI accuracy uh is 99. some per right so a lot higher also um guess what a lot of people call in are in bad mood because they are like uh may be forced to pick a new policy they

may be sick right and the human agent has predominately universally really low empathy score cuz they've been you know the people are not in good mood when they call in um the AI agent doesn't mind right they don't get tired they don't get mad so they have higher empathy score so this company last year still 25,000 AI agent and this year they said they're going to reduce it to 12,000 uh human agent 12 from 25,000 to 12 and and supplement with AI agent and next year they are thinking they're going to reduce to zero maybe so that is a concrete example of how this is generative AI they actually training their own model but the uh what they really afraid of is

they like well the the human agents today are using some part of the AI agent as to help them answering questions and and they said it's actually pretty easy to pull out their business logic from the AI model and if you work in model building you'll know that from a computer science uh standpoint asking uh a set of interesting questions will let you understand how this model is built pretty easily and so they are really afraid of how a competitor could send a human agent to work for them and ask questions and and really understand how they um underwrite certain policies and how they monetize certain policies and that's a big concern of theirs but the

business benefits very visible and so they are going for it uh but they are looking at how do I obscure this AI model how do I protect the prompt that the human agents are using so that people are not pulling out my critical business logic anyway so but in the in the security world so uh how many of you have heard about ra rag right yeah quite a few so in um open AI train their model on vast available data on the internet you know whatever it's it's there they find it they crawl in and there they find things publicly facing um they train on documents images and all that stuff um the thing that

they don't have access to um are your internal data right so like your own legal documents they probably haven't quoted unless it's leaked somewhere um so your policy documents your emails maybe not available right so in order to make a decision uh that based on your own data a lot of times what you supplement a public lrm with a rag model right so retrieval augmented um gener generation right so is Rag and what that is is if you look at a human a human ask prompt to the the computer well uh so the if no rag it would just take the pump and give to the L and generate a a result and come back with rag what it

does is it packages up the query go up to a internal embedded uh internal um Vector database that's been that's been already worked on using internal data it's been chunked it's it's been indexed it's been uh embeddings generated in this no so your policy document your email could already been in that database so this this brain here this logic here will take the user query and send it up there to your own Vector database and that will that database will perform similarity search on the query and getting back the data to this logic here and this logic will take that whatever the data that's res um returned back from the vector D internal Vector database package it up together with the

query that's called the prompt before the prompt and send that entire prompt to the RM and the RM will then answer the question based on both its own internal logic and the the private data that's supplemented to it so this is called rag now if you think about we're doing security questions we're asking security questions what you need to do build your internal Vector database is taking security data right so what are security data we have what do we have we do have we have endpoint logs we have xdr logs we have Network logs VPC flow data we have email data we have policies some of them are structured some of them are not right log data is very

structured and turns out that Vector database doesn't deal with structur data really well it just it wants to deal with instructure data that's how it is and it doesn't quite make sense to take uh log data which is very structured into a and transform it into a vector database so what do we do um there are better ways to transform the data and chunk the data and put it in the way that's consumable by RMS and that is an interesting area of innovation that I'm looking at and but if you think about it taking you know if that the internal sources of data right lot lot of them are uh security structure data and you

put it into this Vector database using LM and you look at this whole thing it's able to answer questions able to point out things you need to do you're like well what products can I replace with this which is really interesting I have one more minute it's so really interesting question right so I I would challenge you to all all think about going back you say how many security products I have today in my environment if I use this this architecture here and assuming it's it's doable in the next no two three years what products do I not need and I've been thinking about that uh on a daily basis I think it's a really interesting question okay so I

don't have a lot of lot of um so just one thing we are investing in cyber and AI my my firm is called rain Capital we're very early stage investment firm so if any of you have an idea on AI and cyber or just cyber because we are cyber investors please find me um cheny at rain capital. VC um one thing uh can I just finish this before I turn over there are two areas in AI that people typically investing there's models that vertical industry models and there's AI INF structure infrastructure is more horizontal right the the risk in the vertical models is that um you may build those Healthcare model fintech model but a lot of them are only thin layer on top

of third party LMS if two guys can build this on a weekend another two guys or or woman can build this on the weekend so as an investor you don't want that you want to weed out the thin layer on top but there's because there's no technology mode for info structure there's usually more technical meat in the build however you have to be um be very careful how close you are to the black hole the black hole is what open AI what anthropic will do so if you build infrastructure on something that they will build they'll take your Market away completely so you have to say okay I'll work on infrastructure but cannot be too close to the black hole and with

infrastructure it's harder to go to market so so there's both sides have risk so with that I'm just going to but um one thing one thing here here so today we have about 200 Mill 200 billion Market uh Global spend on security about half of it is on SEC Professional Services literally about half is Professional Services on security and we think about 80 or 90% of that work can be automated by AI so that's a huge huge opportunity so with that I'm going to close and the future is AI and security and you are in the right place so um I congratulate you for being in the right place right time and I hope to hear from

many of you on new ideas new Innovations and maybe new companies thank you

Navigating the AI Frontier: Investing in AI in the Evolving Cyber Landscape

Related talks