← All talks

Let Me Do It For You Automating OSINT And Recon - Paul Zenker

BSides Munich · 202324:09247 viewsPublished 2023-10Watch on YouTube ↗
Speakers
Tags
StyleTalk
Show transcript [en]

um since we're making Bible jokes today I've uh prepared one for the introduction of my talk um when God created the universe he had no apparent reason to do it and it was just for the joy of creation and he knew it would take an immense uh lot of work to create it and to maintain it um and I understand him I honestly understand God in that part at least um creating is a lot of fun and maintaining that stuff is a lot of fun so I want to give you some ideas uh tools um structures architectures for creating some own fun stuff in the oent and uh Recon automation area uh I stole that label from Daniel Misa I found it pretty good uh pretty cool um the presentation is human created but it has some major AI argumentations in uh the the source code and stuff like that but yeah just so you know about me I'm uh Paul I'm a it security analyst with inside Tech logic we are based here in Munich I'm offensive security certified professional you professional you have my email there and uh you can read some of my blogs I will publish something about this at security by accident.com um agenda we want to talk a little bit about uh why you should use automation that was already in in the introduction I guess um and where you can use it um it's basic to intermediate talk so if you're coding a lot of uh oen projects and tools by now um that's that might be not not be the talk for you I want to help you come up with ideas so uh basically the thing that they tried with artificial intelligence in the beginning I want to give you a lot of knowledge and I hope this knowledge will interact but uh let's see how that works I want to help you to come up with structures for your own o tools I want to showcase some tools libraries Concepts to you and of course I mean uh it's after jet GPT so there has to be some Ai and because this is secur security conference we will have some comics and memes let's get right into it um automation XKCD summed it up uh as always pretty much in this uh comic here in theory you have initial effort for writing some coding and then later uh the automation should take over and you have a lot of free time you don't have to do anything anymore and yeah in reality uh most of the time the coding takes longer than you think it's always like okay I'm going to hack this together in one hour and then you take like 5 days and then there's debug and then you have another ideas and then there's ongoing development because the docker container isn't running properly and all that stuff um so yeah keep that in mind it can have both of those outcomes or something in the middle um so where should you use automation it's basically uh in the information uh technology side at least uh you should treat it like an intern you should give a task that an intern could do and getting coffee is hard to automate sorry but it should be a repeated task um that you can really reuse the code so that it pays dividends and it should almost have no moving Parts if you want it to be easy to maintain there are some uh Instagram and Twitter ENT automation tools and uh Instagram and Twitter and X nowadays they frequently changed their apis and limits and stuff like that and these tools uh break all the time so just keep that in mind basically what you're doing um when you're doing ENT is you're taking a huge volume of data from the big interweb and uh you're funneling that data through some processing and you increase its relevance so you have some data collection stuff um some data cleaning you maybe store your data somewhere then there's some data presentation and then all of that intelligence goes right into your brain and is really relevant um if you've done it right and the symbolic picture here you will do it all in your kind of spaghetti looking code um at least that will be the way in the beginning and I haven't moved beyond that that phase now even so how mainly there are three ways to do it um you can use tools there are a lot of oen tools out there but it's not always the case that they fit your specific use case you can code stuff yourself if you can code which I would advise you to learn um which gives you a lot of freedom but also you have to care for all your maintenance and all that stuff and you can use um automation Frameworks uh this is kind of the middle between the the to so you have some kind of framework but you also have flexibility do you have to do some stuff yourself but you have some structure um for it already I will be uh talking about python a little bit in this presentation uh another XKCD comic um it's python with the programming languages it's like with the editors you can use nano emex SIM at cat or a magnetized needle and a steady hand uh I wouldn't advise any of those for coding ENT automation uh if you can do it even with the magnetized need if you can do that uh talk to me after the presentation I would like to see that um yeah python has a lot of libraries and I want to go quickly uh through some of them so that you maybe get some ideas where you can use them in my presentation there are code blocks why did I put them there um I just wanted to show you that they fit on the slides which I think is pretty nice a lot of the Python code that you need for basic automation can be on uh can be done of uh yeah in some couple of lines and it fits on the slides um for ENT you will probably encounter apis at some point um they allow you to stand on the shoulders of giants so apis have a lot of data and there's basically an API for everything uh you need sometimes it's free sometimes it's premium sometimes it's a hella expensive um but they are amazing for open source intelligence and in Python you have two choices basically you can take uh libraries like uh the library for example and do just a quick showen look up for the Apache web server and get some results for Apache web service on the internet but you could also do that via the native API call so um if you're just using one or two API endpoints it might be better to do it with the native API if you're working a lot with that specific API you might consider the library um and yeah you can also via the requests module of python get the HTML of a website which is pretty cool for um building your own kind of apis um and yeah looking at some uh yeah looking at some websites programmatically regex uh was mentioned in another presentation today um you can find anything anywhere in text it's really powerful it's also really confusing and I would advise you to use API uh API AIS for this um yeah here you have a basic regex that is checking for opening and closing brackets and all that stuff it's was what came up when I asked J GPT to create a confusing regex that it illustrates how powerful and confusing it is um yeah so you can just see uh it's really useful for a lot of uh tasks uh chatbots or telegram integration Discord integration slack integration really powerful in Python you can use it to start things get notifications alerts um what I've done here I don't know if it's secure I know this is security conference but um I'm starting a script on my server from a uh Discord chatbot it doesn't take any arguments so I'm feeling kind of safe but uh probably some of you would be able to break it uh pretty interesting pretty powerful um what you can do with this is basically or what I'm currently working on in this area is to be able to do back Bounty from your phone so you have a bunch of Discord Bots and you can start them for certain domains and stuff like this it's pretty interesting and a lot of stuff you can do with this also notifications alerts um if you have some long running tools in the cloud you can get back uh when they are done and you don't have to sit in front of your computer um you can also use search engines and uh python which is one of the most amazing things it's like Google doing but on steroids and for example you can test if a dangerous dog for some domain you're researching or some name you're researching if you really narrow that dog down you can check like a hundred of them and see if one of them has results and go back to them uh later it's pretty easy again there's a library for Dr go for example it doesn't have the the strict rate limiting like Google has and you can search for some uh yeah stuff in there uh pretty interesting really scary really cool stuff you can do facial recognition and natural language processing in Python which is just amazing I when I first started it blew me away there are libraries for a lot of use cases here and they make really Advanced use cases possible um for example this is text plops so this is like what four lines of code and you can can analyze the sentiment of a text and uh one line of code is the actual text no two lines of code so uh it's it's really cool um you can get the uh sentiment of a text you can see if it's positive or negative about something and with this you can analyze like tweets social media posts articles um really cool tools you can build with that if you kind of want to see how the mood changes about the topic if you want to get alerts when there are spikes or variations really cool stuff you can build here um flask it's just for building your own apis again you see the code here for a simple API um it helps you to make tools available uh I have to warn you uh security isn't straightforward in flas so you have to if you want to do API keys and so on uh you have to look into de so don't expose any confidential apis or resource intensive apis just to the internet um yeah it's great if you're working in kind of a microservice architecture then you can make all of the tools interact I'm talking about apis apis might be expensive sometimes but if you don't want to pay for apis you can just just do web scraping here you can use beautiful soup for um getting content from web pages based on the HTML and CSS text and selenium for automating browsers and yeah you can become your own API um there are a lot of uh blocks about how you can even bypass captures and rate limiting and all that stuff um yeah with that stuff you can really become your own API and basically scrape any web content that is out there even if they don't offer an API just a fair warning it might violate terms of service um yeah storing data uh I think we should build an SQL database um maybe you've heard about SQL if you're here um so uh yeah we want to get into some database options um when you're doing open source intelligence gathering your amount of data you have to deal with can get really big really fast and then you want to move beyond the storage option of like um text files or Json files also they work a really long time uh you might be surprised how long you can use them as data storage uh SQL most obvious option has run basically the entire internet for almost yeah three decades now or whatever it is um gets where you need to go it's well documented you can play around with c gbt and AI on there it's really cool um mongodb is like more uh document oriented more uh in Jason uh style it's really cool for like large text content if you want to play around with that and uh one of my recent favorites uh Neo forj is a graph based database that really allows you to visualize relationships the visualization gets a messy really quick if you have ever run a big Blood Hound scan um or if you try to as I tried recently enumerate all of Tesla's subdomains and Technologies and ports opened and Status codes then the visualization is broken uh but you can still get uh relationship queries back um really interesting for open source intelligence and there's a database for basically everything you have time serious data with influx DB you have um inmemory database with Rus um you have cloud provider DBS like uh from Amazon and Azure stuff and yeah there's just a DB for everything the cloud uh this omnus computer in some guys basement that is kind of running uh the entire internet with a lot of caching if you're not trip uh over the CT um you want to deploy your tools somewhere and uh you just don't want to keep them on your personal device uh running 24/7 um so I suggest you maybe get a server or use some of the cloud stuff um really useful in this kind or in this area is Docker is uh basically kind of a really small virtual machine almost all of the available tools have Docker images which uh save you some dependency conflicts and all that stuff and it's really easy to install tools on Docker so those are the instructions for installing SubFinder kind of summarized on the right side and down here is the docker comment to do it um really powerful if you build a Docker image for your own tools it makes them really easy to deploy everywhere um if you're managing Docker in the cloud I would suggest you use paina it's Docker with a goey and selfhosted and just beautiful it's amazing I host all my tools on like a painer instance and I can just spin up stuff and spin it down yesterday in another Workshop I just spun up a Docker container and built a scanner on it kind of heed together in 20 minutes um it's really amazing what you can do with uh this stuff because you can also uh open a shell in those containers and uh yeah it's just amazing works on every device because it's in a web browser really cool stuff if you don't want to rent your own server uh I suggest you use stuff like render um which allows you to host websites apis databases for free it's with GitHub integration so you just push something to your GitHub repo and uh the stuff will be immediately available in uh yeah in the service you're deploying and it's just uh yeah it's not serverless because they obviously underlying servers but it's what they call serverless um you can also host databases in the cloud a lot of database providers have a free tier like mongodb Atlas Neo for um it reduces uh setup and maintenance but you obviously have to trust the database provider um yeah that's on you if you're just scrawling open source data um then it might be okay if you're storing a lot of data the free tier is used up pretty quickly um yeah another one I find just really amazing is kafana it allows you to build uh charts and dashboards you can basically give your application a front end because you can use HTML and JavaScript in there so you can basically build anything in kind of this dashboard architecture uh what you see here is one of the tools I buildt um that just craws some uh yeah cyber crime forums and puts down the word frequency some leak databases word frequency over time and some cves they are talking about if you're interested in that project and want to develop it with me talk to me afterwards and uh yeah it will be probably a future conference talk somewhere and of course you have the real uh cloud like AWS ezure and so on be careful though if you leave the wrong ec2 instance on it might get you into poverty there are some uh funny uh threats at Reddit um there's a cloud service for everything you can even uh do satellite based stations on a WS uh it's crazy um you just have to look at cost and the scaling is to the Moon you can just run everything on there just quickly talking about a AI jet GPT is really cool for coding give it kind of uh small tasks and uh incremental tasks iterate on it h it can even do web scraping if you give it the HTML of a page it can create a web scraper for you um it can help you with architecture design if you're concerned about privacy or API uh API costs they are local models like GPT for all and so on you can run them on your device they're not as good as gp4 of course but they are surprisingly resource efficient you can run them on almost every end user device that's we that we hackers are processing and you can obviously uh become your own Sam Alman and train your own AI model on your own data I'm not an expert on it I've just played around with it and it's pretty interesting though uh so take a look at this since this is security conference uh if you're publishing tools don't store keys don't trust every uh Library you have out there don't trust all of jet gpt's code and don't forget about unsecure test systems I can sing songs about how I forgot about this or that server and it was hacked um yeah tying it all together um the question I'm asking myself right now is can there be a framework then that uses all of the ties all of the stuff together um we have developed oen compass in a bellink teon so if you want to check that out on GitHub and if you want to help develop it it's been like dormant for half a year here because nobody's using it and uh if you're doing some ENT uh tools or Recon tools after this please send me your work via email X pitchon carrier whatever have you um some ideas uh for you to get started on building pretty interesting if you have like a crawler on Docker put data into a database in a cloud and then visualize it with a CFA dashboard you can do some pretty amazing stuff there or the Discord or telegram chatbot this have has some scanning tools on Docker that you're activating uh via do um pretty interesting and I know uh since this is about ENT Automation and there are great tools out there uh just give me a couple of minutes or a couple of seconds to do some speed dating over some tools there's a list here to look at into them spiderfoot one of my favorites can do a lot of uh data crawling is amazing has python modules that you can extend it's open source crazy a project Discovery has a lot of buck Bounty automation tools for Recon for uh scanning is really cool n8n is one of those automation Frameworks where you can kind of Click your stuff together you can self host it uh aache airflow is another one of the automation Frameworks baby AGI is pretty cool it's kind of an H AI that iterates on tasks you give it and prioritizes it and has some execution agents A L chain for training together AI modules with uh stuff really interesting while too is craft based investigation tool there's a free version it's not too easy to extend but is it is really powerful for getting started netlas doio is kind of a Shen alternative um it's really cool uh if you want to look at this uh search OT rocks lets you look into some breaches and leaks um start me has a lot of oen resources uh there are different pages by different people uh you can look at that stuff uh Tom Nom Nom originally more in the uh Buck Bounty scene but he has a lot of like text input conversion asset finding stuff out there on his git R repo uh pretty amazing Cipher 387 uh she also have has the Twitter account cyber detective has a lot of cool ENT lists and stuff there if you want to get into self hting awesome s host it has like a long list of thousands of links and if you're thinking this oen is pretty interesting um you can either come work for us at inside Tech logic we're doing a lot of ENT stuff we also providing ENT training and ENT investigations if you want to see your own Tech surface uh yeah that's all I got sorry for being uh so so Speedy was aot of content uh the slides will be available later you can also come talk to me or message me whatever are there any [Applause] questions the interesting top topic um Paul um are there any questions yeah hi uh thank you very much for this uh super cool presentation uh how do wait actually Google dork uh limitations because if you dork a lot so like it stops responding to you du du go library because it doesn't have the rate limiting like a native API library is yeah there's a there's a python library for duck. goo um so you can just uh use that if I find it somewhere in here um yeah here there's a duck. goore search and you can you can use that cool thank you yeah it's amazing yeah so uh thank you for your talk um you mentioned quite a lot of libraries and you're also saying okay there are there's a lot of cool stuff out there but actually how do you verify when you're using all these libraries that not someone at