Jake Coyne & Andrew Gomez - SplunkGPT

Name: Jake Coyne & Andrew Gomez - SplunkGPT
Uploaded: 2023-10-09
Duration: 43 min 49 s
Description: In the evolving landscape of cybersecurity, professionals are often inundated with vast amounts of data. Splunk has been a game-changer in analyzing and visualizing this data. However, crafting precise queries in Splunk’s Search Processing Language (SPL) requires expertise and can be time-consuming.

BSides Augusta43:49263 viewsPublished 2023-10Watch on YouTube ↗

About this talk

In the evolving landscape of cybersecurity, professionals are often inundated with vast amounts of data. Splunk has been a game-changer in analyzing and visualizing this data. However, crafting precise queries in Splunk’s Search Processing Language (SPL) requires expertise and can be time-consuming. Enter SplunkGPT – the start of a solution that harnesses the power of GPT-3 to transform natural language queries into SPL, making data retrieval more intuitive and efficient. In this talk, we will journey through the development of SplunkGPT. We will start by exploring the capabilities of OpenAI's GPT-3 in processing natural language queries. Through live demonstrations, we will observe how GPT-3, in its vanilla form, can handle basic queries but falls short when faced with complex, domain-specific questions. Recognizing these limitations, we will delve into the world of fine-tuning GPT-3. We will unravel the process of collecting domain-specific training data, creating templates, and refining GPT-3 to understand the intricacies of SPL and cybersecurity data. The audience will gain insights into the challenges and best practices of fine-tuning a language model for specialized tasks. Next, we will unveil the architecture of the semantic parser that integrates the fine-tuned GPT-3 model. We will discuss how this parser converts natural language queries into SPL queries, and how it is seamlessly integrated with the Splunk database. Finally, we will explore the broader applications and implications of this technology in the cybersecurity domain, followed by an interactive Q&A session.

Show transcript [en]

get started with our final contestants of the day we've got Andrew and Jake only one person clapped all right all right it's been a long day thank you everyone for coming today's presentation is going to be on a tool we created called Splunk GPT um but not only are we going to be discussing how we ended up creating this tool we're also be going over how to use some of the prompt engineering techniques and how to use open AI specifically the API um calls uh for your own Endeavors and if you end up into the situation where we ran into or you need to build an agent we're hoping by the end of this presentation you've

at least got the key components um and we'll also end up referencing our GitHub page here with how to build your own agent and hopefully this helps spark some ideas there towards the end so without further ado let's move on to the who am I so my name is Andrew Gomez I'm an offensive operator at sixgen it's a fancy title for a pentester um been there for a couple of years and prior to that I worked in the Army as a computer network defense manager so I'm J coin very similar background started out in the Army uh did the The Deco stuff went to the oo stuff for a little bit and then made my

way into the uh reserves and and work also as a offensive operator at uh 6en all right so we'll move on over to the next slide um as we go through this agenda here uh think of it as the first bit being more of like an academic hey this is what an introduction or this is what an llm is here are the limitations and then we'll move over to what we call our cookbook or our jupyter notebook where we're going to show off some of the examples um and all this will be available later on and we'll provide a link to it so um if for some reason something doesn't make sense or you just

want to steal this for reference later on we'll provide all those resources and documentation at the end so anyways for our agenda of today we're going to be going quick introduction what led us down to the road of engineering Splunk GPT discussing what act what agents are and ultimately demonstrate our tool in action in a pre-recorded video because uh I don't really trust the demo Gods they're going to get us at some point and that the truth uh so yeah so we kind of started with Splunk gbt is like a plan and self- prompting to sort of uh work at or work but before we get into the Splunk GPT itself we kind of to

bring you through some of the initial aspects of an llm and what it really is so an llm itself is a specialized AI model that has the ability to generate uh human-like text so what we have is a model that's trained on you know a vast Corpus of data with different patterns different nuances and that kind of leads to uh a unpr predictability inside the model itself but if you've heard of some examples like uh Llama Or flan or Bart or GPT then you've heard of some of the most predominant uh llms that that are out there currently today um but like any technology there's a lot of limitations um so hence the the limitations here and one of the main

ones is token limits uh so token limits if you see gpt3 for example has a 4096 character limit or token limit uh I guess that should probably say token but at the end of the day the the model is or the number of the token limits are based on the model itself so you're going to have variation so gpt3 and its original state had 496 and that's like four characters is is a token and for each token it's going to cost you uh some money so as we look the computational cost you're going to spend somewhere between you know a fraction of a penny for a thousand tokens um but as you increase in a model for example if

you go to GPT 4 you're going to be paying pennies which is significantly higher especially when you have long prompts and long uh responses that you're going to try and feed back into the llm for uh contextual relevance another path we ran down was fine-tuning so if you're trying to fine-tune your model that's also going to cost you a good amount of money we spent some money there and we'll let Andrew show you some more about that in a minute but the model itself um has these these limitations that you need to take into account when you're looking if you wanted to do a local llm for example you're going to have power consumption requirements you're going to have those

types of uh attributes you have to to take into consideration when when you're you're building your system uh and then the last thing and some people would argue one of the more important things uh we saw it in in Ed's talk in the beginning or the keynote speak uh they they can hallucinate so so models can come up with a their their intent is to be just a next word predictor and there's some unpredictability because of the complexity in the model they operate in more of a blackbox sense where we put stuff in and what comes out is what comes out so they can make things up uh they can hallucinate and you know if

you're a New York attorney that can turn out pretty bad for you um so these are these are situations or limitations that we have with with llms and some of the examples we came across when we were trying to work through this app uh one of the solutions that we came up with or well we didn't really come up with we implemented was the use of agents and working through some of some of that but before we get into it I want let Andrew walk you through some of these uh key points that I just talked about uh in the notebook yeah so what we did is we developed a cookbook here just kind of

walk people through if they wanted to later on on their own and also just kind of learn on their own um so when it comes to like the limitations of an llm we mentioned hallucinations um and we also mentioned we don't really know what data was trained with to start so hopefully it's a little big enough if not we can expand this a little bit um what we did is we asked in the user prompt hey write us a Splunk query that lists users with failed logon attempts we used the da Vinci model back from gpt3 um and provided it a temperature of one meaning it doesn't have to pick what it thinks is 100% going to be the next

character it has some flexibility with what it thinks the next possible character should be um if you use the API playground in open AI you can see that demonstrated in green boxes the more green it is the closer it is to 100% chance that that is the next character versus the yellow and the red boxes so we're providing the llm or in this case uh the da Vinci model some flexibility with this response and it really didn't come too close this wouldn't really run in Splunk either using the failure count index by username or both by host IP so we noticed that we're like okay it it kind of is giving it an attempt maybe it just

doesn't realize like what the pattern is and so that's when fine tuning comes into play that's when you want to help it learn a pattern to Output not to learn more data um and so nicely documented here on open ai's documentation is Step One is to gather the data uh we roughly gathered 500 different queries and formatted it in the same format that they recommend being the prompt followed by what you think a user would ask and then the completion followed by what the expected answer should be um and you kind of see that example there for looking at uh failed log on attempts with both the administrator and guest accounts with more than five failed log on attempts I

think we got this one from I don't actually remember where we got this one maybe it was from Ghost blunk database from um a question someone had ended up asking on there but um once you gather all that data have it in the proper format you're going to go ahead and start training a model in this case we selected the da Vinci model and we decided to follow it as it tuned it for that um for that specific model and later on ended up asking that model back in July hey um we still want to list out users with failed login attempts um let's see how you do and it did come back with a Splunk query but it's still

not quite on the right track so we've started to head down the path of we have the right format but in reality we're looking for something similar to index equals main looking for the windows uh Source type of security event code 4625 for those failed log on so it's not really quite there but we're in the right direction so we took a time just to kind of sit down down and we're like hey we've spent about 40 bucks off of 500 queries and we've gotten in the right direction but we're not where we expect it to be we expected to have something like the um response listed above and so we're like okay what are

our options do we continue to reiterate on this tuning process is there already a tuned model because we're probably not the first ones to come up with this idea or should we do something called creating an agent we'll discuss a little bit more what that is um we actually ironically couple days after we attempted to tune our model saw Splunk had developed their own version of co-pilot and Jake's going to go into how exactly this works yeah I mean it's it's similar to what Andrew was talking about but as he Scrolls through here really all this is is the first function there is just pulling the download. splunk.com so you can go to splunkbase you can download

that application and you can uh unzip it and look inside and if you unzip it you look inside you'll see that there is a model in there it's a T5 model it's um CPU optimized so they have it in the open neural network exchange format but these classes and and all of this stuff is prettyy much just ripped straight from from the code that's in that application uh so as we scroll down here we're just using the model that's Splunk tuned and they have a whole nice blog post about it and it's pretty much going down the exact path we were so we figured okay this is proving our concept enough that if we can use this model and

prove what we're saying here we go so now we ask the the Splunk AI model hey write us a query that lists the users of failed login attempts and you see it gives you a much uh better response something that actually can run inside a Splunk database but now let's ask it about something that likely didn't exist when they fine-tuned this model or when the T5 model was was initially uh open source released and you see if we say adcs esc1 thanks to you know the Spectre Ops folks that are sponsoring so we figured why not throw that in there um it doesn't really give us something that's that's exactly great so we figured all right Splunk GPT if we

want to go through this route needs to have access to some tools needs to have access to some agents and that's where we started talking about the road of agents and before we get into all the intricacies uh we want to go through some of the engineering steps or some of the key Concepts when looking through this this I guess Adventure we went down um so arguably one of the more important parts of any uh working with any llm is your prompt engineering there's plenty of prompt engineering methods Andrew will kind of show us examples of some of them but there's examples on the slid so we have zero shot F shot Chain of Thought um there's plan solve prompting

and reason acting some of these are more recent um techniques that were released into the public and and definitely something that we used within this this project so in the The Next Step that you kind of have to determine to get output quality and get output um efficiency is some of the memory management you have short-term and long-term memory how are you going to manage the uh conversational history that you are interacting when you're interacting with the model um one step is Vector databases so a vector database is just a database that can take in well Vector stores of information so if we if any of you sat through uh the the Nemesis talk they

talked about how they were taking some of the documents that they pulled from like a Cobalt strike Beacon and forwarded that into uh a embeddings model got out and embeddings and they were able to do some semantic search against it uh we do a similar style approach Ro here where we're taking proprietary intelligence or proprietary information a fictitious blog post in this EX in our case or for the demonstration to question and answer and identify contextually relevant information um and then to address the problem with like esc1 we give it access to some tools so we give a different agent that has access to tools and those tools are things like Google search or hugging faces uh well these are examples

but we really use Google search and then we wrote little function that allows it to go into and actually query a Splunk database using the the open source Splunk SDK so uh Lang chain was the main proponent for how we went about this and and you can see it when it's it's out there um and then you know lastly chaining it as an agent the the big part here is we're able to uh give the system or the agent access to tools and allow it to make a decision rationally or make a decision and act on that or make take a thought of if it answered the question or what tools it needs in order to

answer the question and then actually act on that by giving it access to the tools in external system for example Google search if we say adcs esc1 it'll say I don't know how to access that and go out to the internet and get some you know some more information uh so I'm G pass it over to Andrew here and he's gonna kind of walk us through some of this so starting off with the zero shot approach it's probably the most common um approach that we use day-to-day with something like chat GPT um this isn't for examp this is not using chat GPT this is using gpt's 3.5 llm so there's a little bit of a

difference here um but the first one being when you start using something like the API and using something 3.5 and above you're now able to start setting different roles one of them being system user and assistant when setting something like a system we can see here in the message that we are now telling the llm to act as if they're a cyber security analyst so it's going to modify the way it responds as if it was an analyst instead of your average normal person um and once again we're asking here to create a list or a query that lists users with failed log on attempts and simply giving it some room for its creativity with a temperature of one um

and with a zero shot approach we're not providing any other contextual examples or information on how we want its response to be so it's going to get creative and it goes sure here's your information index your index Source type your Source type action failed so we've definitely got a Splunk query here it just doesn't now know our indexes it doesn't know what our sources are um so that's one of the issues that we ran into with just implementing something like a zero shot approach you can't just take that query and run with it it needs to somehow enrich it um and so maybe some of the ideas there would be implementing something like f shot um

here we're just doing a very simple example we're not trying to pull something from our tool um we're just providing the example of what a positive and negative comment is and we're saying um with fot you're providing an answer in advance to what it should think it is and here we're saying keywords like love is positive and hate is negative so when we ask it do you think this presentation is great it says yep I think this presentation is great and it's also a positive comment so that's how F shot works there um an adaptation of f shot would be Chain of Thought um essentially now we're starting to help it go through reasoning um we didn't really give it

like how to reason through someone's words or how to determine whether something's false or positive so if you start giving it something a little bit more complex like create me a list of tasks it can get pretty funky on how it thinks tasks should be thought through so here we're trying to take an example from our tool and we're saying now you're a task creation agent um if you're not part of any system or device uh your first step is to understand the problem then extract relevant data sources Fields field values and then based off the objective provided uh de what's called um create a complete plan and then at the very end your plan

should always be to interpret the results that's essentially how we're going to be end up asking our agent to come up with steps to do something like detect a curber roasting attack and so that's what Chain of Thought is there but you'll see down below it's kind of unstructured data it's hard to parse through and it if you were to use this it you'd want to hope that it always starts with the word step one always want it to start with step two those key wordss to help it kind of find that structured response and that's something you can't always get when you do something like a temperature of one so um that's when you start implementing

something like a plan and solve method we're combining the Chain of Thought approach with f shot and we're actually giving it not only hey stepbystep sort of thinking instructions on how to create a list of tasks but we're telling it how you should respond and in what what format so here in the examples almost identical setup as far as the model and the message the roles but the examples is really where it starts to deviate is we're telling it what a Json format should look like um so that way later on we can parse this information with a python json.parse um and you'll see the example doesn't have to be related to the questions at hand you

just have to provide it an example of how you want things to be formatted and later on it ends up instead of saying step one 2 3 and before it just listed out those steps in a format that can be then passed to something like function calls um so how do we end up chaining some of those prompt engineering techniques that we've just mentioned below uh above is something with Lang chain it's a nice little wrapper that helps you build together all your different apis U maybe you have different tools or different functions that you want to try to tie together this is where Lang chains really comes into play and becomes a really powerful

tool um and we'll see that in the very beginning with a the final prompt engineering technique which is reasoning and acting or you'll see it sometimes called react um typically when you see someone use react you'll see them load toolkit Ser API is the Google API tool um specifying what llm to use and then giving it the the key itself initializing the agent and then um we've decided to return a verose of true so that way you can kind of see it in action um and then finally also saying hey this is the um question we want answered so here what it's going to do is reach out to the internet and then go through um its react process but you

can't see that so what we did is we found the um the GitHub issue where someone's like hey I actually would like to modify the way it thinks through some of this approach and so and said hey no problem if you're interested in modifying the way it thinks by observing the question the thought the action the observation once again a thought and then the final answer here here's how you would do it and so what you would do is you break out the prefix your format instructions and finally the suffix um and up here in the prefix that's where we start adding something like you are a detection engineer you will do the following Chain of Thought

approach um and so that's where we end up piecing it all kind of together for that very initial start to our agent which is hey we need to identify which Windows Event codes could be used to detect adcs escalation path one and you'll see it in action it's saying all right so I need to go out look this up in order to look this up I determine that these are the event codes um event ID 4876 and it kind of starts listing out what they're associated with and then the final answer is the only thing you would actually get back which is these are the event codes and you proceed from there but just that by

itself isn't enough um you're going to have to end up taking that information and enriching their task list which is what we'll end up demonstrating later on in the tool um the last little bit is Vector databases um as Jake mentioned it's a great way to reference local files or just local information and then quickly reference it for your answers here we're providing just um a very fictitious uh scenario a company called evil Corp has developed their own software called bides Augusta they unfortunately fell um to a supply chain compromise attack where now that bides Augusta piece of software is looking to install a back door on every system that it's currently installed on since it's a

service being run as admin the first thing it tries to do is see hey does is there a local user called bsides if not let me create and add that local administrator account um and so that's the fictitious scenario and that information is being stored in this blog post and what we've done is we've broken down and stored that information into an embeddings model and just asked it hey what are those three minor attack techniques to the bsides agust the supply chain compromise if you went out and asked an agent to use Ser API it's it's not going to know this information doesn't exist and it'll either come back and tell you it doesn't exist or it could also hallucinate um

but by enabling this Vector database we've now got the ability to research information that you might not want publicly accessible U maybe you've got like a private report that's specific to your company and this could help enable the team to rapidly search through the information and then use an agent to um query your seam and then analyze the results based off of that information so um the last little bit is tools for interaction we've kind of already mentioned one big one which is just using Ser API to research the internet we've started to mention Vector databases um you can use more than one tool and that's really all this last one's getting at is we're using the

react prompt we're using the Sur API now we're also using a math tool to help determine um based off of today's date how long ago was it from December 2nd 2013 and ends up going through that process of Googling hey well besides Augusta happened to occur today um and the answer between the date you provided and today is 3595 days so just showing for that one example you don't always have to use one tool you can chain multiple tools into the Lang chains and it'll help piece it together as it's going through its prompts here the react prompt determined hey I actually need to implement the math tool to figure out the answer um and that's about it before we dive

into the agent right or am I jumping into the agent okay sorry we originally had an extra slide and um always get thrown off on this part but we'll go ahead and dive into the agent itself specifically building an agent with function calls so we mentioned the first step is to just generate a list of task and then research um what it's doing on the back end is determining hey should I research locally or should I research out on the internet then based off of the first t tasks which is going out and researching um let me refine my list of tasks so now I'm no longer looking to create a curb roosting query in Splunk I

want to create a query to detect curb roasting using these specific indexes Source types event code so the query itself is becoming a lot less likely to fall into the Trap of a hallucination um and we're starting to execute each task one by one and that's all we're trying to highlight with this little mind map here is it's going to end up going through its execution uh should I start by writing the Splunk query um should I then go back and determine what uh Fields I actually have um should I refactor the splint query that you gave me and then finally does this query need a statistical analysis to help refine a detection approach does

it need something like a count of greater than 10 um and then finally we can't always trust it to be on its done completely on its own there is a final piece that asks for human interaction or human input um hey you're query actually uses a completely wrong index or it uses completely wrong Source type this is the option or the opportunity for the human to interact and say nope actually remove this line or modify the query to say this and then it executes and then retrieves the results and analyzes it um and so what we've done here is just kind of provide you a snippet of what those tasks look like and how we're calling

each function is when it creates the task it actually determines hey the first agent or the first task should be number one and that agent or the function we're going to call is the Splunk writer second one filter refactor statistical analysis execute and then finally analyze much like how a human would determine hey in order for me to write something like this query that you're asking me first step I need to do is understand the problem determine what information helps me answer it and then ultimately interpret my results and constantly refine it that's all we're trying to bake into this agent here at the end of the day um and here the function calls are really just going

through conditional statements and check for that agent field is it the right field is it the filter field um and that's all it's doing here um I've commented out the actual tool in action because Jake's going to be demonstrating that here in a second but essentially what it looks like um on the back end which is what our streamline application is not really going to be highlighting it's just going to be showing like hey I entered this function um when it first starts off you see the the uh bsides Augusta attack starts off by writing a spun query called spun query equals so straight off the bat it's not a valid query but then as it filters and refines

it's starting to create a more valid query and ultimately we come down to the point where we start asking for user input and say hey is this good yep it is good or no um and in the very end we do say hey we don't need to modify anymore it goes ahead and executes it and then a very uh brief version of the answer comes back saying the Windows 10 host was the one that was compromised um Jake will actually end up showing our tool in action here um to give you guys a little back background on the environment we ended up using if no one's ever used the game of active directory uh GitHub repo

it's essentially a nice automated way to quickly deploy out three domains with a bunch of built-in active directory vulnerabilities and allows you to kind of learn stuff like adcs attacks or maybe you want to learn about relaying or ntlm downgrading um great tool to go out there set up your own deployment um takes about three or four hours to deploy the entire domain or I guess in this case three domain versus imagine standing up on your own those three domains then configuring them for those vulnerabilities this is just a nice automated way um and it saved us a lot of time so now I guess demo time yeah and one other point to to G

back directory there's a great Blog the people who wrote it uh orange I think it's like orange SEC or orange cyber security something like that has a great mind map and a Blog that goes along with it so definitely recommend that um but yeah so so in the demo here what we have is is the the streamlet application which is just saying hey write a Splunk query uh to detect insert what you want in my Windows domain so we put in curb roasting and the first thing that's happening is as Andrew was showing the tool is going through it's actually determining I need to research I need to find more information about curb roasting and it's going out to the

Internet it's taking uh the Ser API well it's going it's scraping for URLs it's going to take those URLs send those to another function which is then going to take all the content that is on that web page in plain text the then it's going to go to another LM which is going to say summarize this content because as we talked about the limitations of token limits we can't just send seven web pages worth of information of plain text and say summarize this for me we need to Summarize each chunk and then summarize the chunks of chunks into one main point uh and once it does that we send that summarized chunk of chunks to a

different uh agent and again this is these are different llm or these are prompts basically but llms that are pulling out this information of unstructured data and it's going to structure it for us so we said pull out the event codes that are relevant and then we send that off to that Splunk search feature which is going to actually search the Splunk database do like a field summary and then parse that data bring it back in and feed that to the next uh prompt Bas or the next agent and as we go through it's now creating that that task list for itself it's the one to end steps it's determining what agent it wants to use because the

context is given uh it's then going to Loop through and actually execute each a each agent here um so once it's finished executing all of those agents it comes out with this Splunk query and as you can see if you pause it here I was just editing at the human feedback and that kind of shows some of that that hallucination we were talking about it got a little wrong and we just know this as as humans if that you can move it around I think so what it got wrong there is it's actually Source The Source type or the source type would have been win event log but the source is Win event log security that was provided the LM yet it

still got it wrong and it still put Source type we said that the source was this and the source type was that but it still decided to just concatenate the two and do whatever the heck it wants right um but showing here in that Splunk search just a second ago is that it doesn't actually exist right uh no results came back so if we fast forward a little bit past so you can see that um the the app will get the results back this is just out of band showing you the real Splunk Dash they the real Splunk search so um you go here let this thing finish it's going to come back with oh crap no results and then the llm

is going to interpret those results that it got and say hey the query was attempting to do this and this is all about how you engineer that prompt we Eng engineered it to say give us a highle summary as if you were presenting this to a mid-level manager so it's presenting it that way um it's saying hey there was no no detection of curb roasting but now we're going to go through and we're GNA ask call Drogo right everybody who loves Game of Thrones would love to be call Drogo uh well I guess maybe not at the end um yeah so with that we're GNA actually run a curb roosting attack uh using that on

the squl service and go ahead run through okay so that is successful so we should expect to see some results here and we go through and the exact same thing is happening again we're going to do that research we're going to go and pull the relevant event codes gather the relevant information create that task list of how we're going to go about creating the actual Splunk detection uh and then forward it in this case it got it right right it didn't actually say any Source type we're just checking the query and everything everything looks right uh so just basic basically hit enter no human feedback and copy that whole query paste it into Splunk for a second here so you can see

that there are some results always forget to change the time so and here we go so there are results that is what we expected and just for clarity those were two different Splunk queries to detect the same thing it just generates it how it wants there was some difference in there uh how it goes about detecting it but the results that come back are fed back into a summarization of those results and what it says here is the you know the queries time detect CB roosting uh and the results are that the C Drogo and the SQL service um are somehow correlated in this curb roosting attack uh So based on this it's recommended that you do further analysis

and so on and so forth um and lastly on the the the left side there you see that search local Vector data store literally all that does is prepend a local search to that search uh function so the text string of local search is literally the only thing that's prepended and or prefix so that kind of lets the the llm decide which tool to use whether it's internet or local so now it's going to use the local Vector data store it's going to search that Splunk besides Augusta uh blog post which is completely fictitious nothing exam nothing actually exists go through the same chain so the only difference there was the um research and as it goes through it's now

looking for Relevant or it's now generating that task list for itself and we can fast forward a bit because Andrew showed you pretty much that in the uh talk there whoops let's go a little bit back get the results it's finishing up and there we go so like iy said it was fine and these aren't going to be the most like efficient Splunk queries either that's what we've kind of come to the conclusion they're not you know doing any accelerated data model search that is potential future but you know you're not going to say hey do a t stats on this because I have 10 billion records right the uh you're Nota going to probably want to search in in all time

with some of these plun queries that's for certain but it's it's more of the the concept that's being proven here is that it is going out it's look looking at the semantic similarities of that document it's then pulling back contextually relevant information feeding it building a query based off of that information and those queries are then running with very minimal uh changes so it gets the results back and you know the summarize of those results is hey it successfully detected these instances we think that you need to do some more research and also the win 10 host is is a big uh a big red flag for you so that's pretty much uh you know

the demo and some of the references that we've we've really gone or some of the things that we uh took into consideration a lot uh were or not took into consideration took inspiration from uh baby AGI agent GPT if you're looking for a good blog post agent gcpt on reworked ai's site has a great one uh baby AGI it's supposed to be automated generalized intellig or general intelligence um that's a great project that's out there it helps really understand it breaks things down or it's easier to read their code as opposed to trying to go through like autog GPT or agent GPT their code is is really well uh written and there's a million Forks

off of baby AGI to include uh easier to understand um projects that are out there so baby coder is an example that's is this still working yeah baby coder is an example that uh that's out there and then Splunk AI obviously you guys saw we we totally stole that from them uh so you know that that's a great great piece of information and some of the future work minimizing the amount of prompts that we have our prompts are fairly long we were trying to get um where we wanted to and had to do those extra different agent steps so if we were able to build uh maybe more efficient prompts prompt engineering is hard uh it is definitely

a task that is uh sought after in the the real world so if you're really good at building those prompts you can probably make some good money uh and then just continue testing and refining so we're you know got the or we have the project up on on Andrew's G so you're able to take a look at at what we got and don't make too don't make uh fun of our our coding abilities too much but yeah I think um we have some questions we're supposed to ask to give you some prizes here and then we'll uh try and take some questions I think we talked really fast so we're a little ahead of schedule but Andrew I think you got I

can think of the first question can anybody give a well can anybody give for this blue namicon thing book can uh anybody give a an example of uh you know hallucination that you've seen in the real world um what you got back

there I guess uh I I guess

yeah I think that's fair yeah that's that's a good example what I was like I was like yeah so that's a great example um obviously the funny one is the one that I tried to reference and that also was talked in the keynote you know and a great example is the New York attorney who had a great time with trying to explain to a judge why he thought chat GPT was 100% right and that judge was wrong and all of literature was wrong all right for mine I've got the easy pickings lockpicking set and guide I'll give this to anyone that has a question for

us all right in the

back

yeah so the question was whether or not our employers sixgen paid for us to do this um at by sponsoring us um and if this had stemmed from some sort of employee or not employee but um previous engagement where the company had asked us to develop something like this um the answer is no uh six gen didn't say hey we want you guys to go and give a presentation um they do strongly encourage um and provide time to conduct This research but it wasn't something that they were like hey we have this task we need someone to go out and create an agent to write Splunk queries um we do have a tool that is being

utilized right now to um essentially we have a tool called The Raven It's a red team Deployable kit and one of the things that they're working on right now with um an M except it's can be a local instance is searching all of our documents that we have for using the server itself and troubleshooting it into a chat agent where you just go to mattermost you're like Hey how do I do this it'll read all of the documentation using a vector database and so that's something we've helped out a little bit but it's not like a immediate ask from the company it's just been more of a side project we're like hey uh Jake mess

me he's like I've got a great idea I was like oh no this can't be good um so that's where this kind of headed down um but good question

um I feel like the the most interesting challenge was the everything moves so fast so you're trying to work down one process and you're trying to think through all right I want to use this uh Lang chain method or this L chain I want to bring in this this technique for this prompt engineering method or whatever and then all of a sudden you know a month later Lang Chain's like hey great idea we've now got 17 different things that you can can look up and you can you can do so like the speed at which the the uh I guess world is moving right now is a unique challenge that I don't think we've seen in many other places uh

there's there's always something new and something better or what you think is going to be better um that's coming out so that's definitely a challenge that that I don't think uh expected kind of going into it um especially in the last like I don't know since January so however many mons what 10 months that should have been easy math but you know I went to a state school um yeah what else any other questions that you can yeah and his so the question was uh are the juper notebooks available on our GitHub the answer is yes the presentation the demo the streamlit application as well as the jupyter notebook um and then we have another

jupyter notebook there which is just the streamlit application just not in streamlit um in the event that you don't really like streamlit because it's a little finicky and it's hard to go and manage those sessions um you have the ability to just rip straight copy and paste um from that other Jupiter notebook and don't have to worry about visualizing it in a pretty web app um any other questions from the group uh the question was have you tried incorporating internal intelligence I think Jake actually had an idea for

this I mean yeah to a degree yeah that was kind of the point of that blog post which was uh the fake blog post we literally just we wrote up a 500 or a thousand word just nonsense post saying hey bsides Augusta supply chain attack this that and the other and that was kind of the proprietary information that was the proof of concept for that proprietary information it was stored locally it was embedded uh local well I guess we used the open AI embeddings model so they have all that information too right in theory um but with that said it could have been any model you want because we Ed laying chain which is great you just say llm equals whatever

you want um and whatever embeddings model there are local embeddings models and that's kind of how we were going about that process of uh using the local intelligence or local uh proprietary information that you would want to embed and then retrieve um that does get kind of expensive like again computational cost anytime you add another document you're going to have to embed your whole entire document store again and again and again and again so like there are challenges to that that method um and you know fully accepted but you know it's just kind of the way that we were going about it

yes well I I will say I'm very thankful to have bought $22,000 worth of Cisco calls or Splunk calls before the the acquisition so I'm a millionaire now um no I did not that was not me for any of you who saw that in Twitter um I think that the Splunk AI kind of process that they're going down because as you saw that was a blog post that was really recent that was I think in March that or when July July it was in July that they put that blog post out and showed the the uh Splunk AI assistant so I think it's going to go significant I think that uh with Cisco's acquisition they're

probably just going to have more resources um I'm not really sure of the internal corporate structure how that really looks between when Splunk is bought out or if that actually happens but

yeah what else we got anything else think of anything well I mean definitely thank you guys for coming out I know that was the last Talk of the day you guys were probably uh ready to go home sleepy got here early at like s so uh we're ending a little early 10 minutes about 10 minutes or so early so if uh there's any questions we're up here and you didn't want to ask the group but we're up here and otherwise thanks for coming out

Jake Coyne & Andrew Gomez - SplunkGPT

Related talks