Why We Research

Name: Why We Research
Uploaded: 2024-11-12
Duration: 57 min 52 s
Description: Day 1 Keynote - Why We Research Marion Marschalek (@pinkflawd) There is this picture of a security researcher in many people’s minds. Some dark clad figure dwells in a basement, surrounded by electronics, and then suddenly a few weeks later an ATM spits money at them. Wowz, lets go write exploits y

BSides PDX · 202457:52182 viewsPublished 2024-11Watch on YouTube ↗

Speakers

Marion Marschalek

Tags

CategoryResearch

ResearchMethodology

StyleKeynote

About this talk

Day 1 Keynote - Why We Research Marion Marschalek (@pinkflawd) There is this picture of a security researcher in many people’s minds. Some dark clad figure dwells in a basement, surrounded by electronics, and then suddenly a few weeks later an ATM spits money at them. Wowz, lets go write exploits y’all. But is this how it works? Why do people go down these rabbit holes, how does big research come to life, and what if you’re not in a basement but back in some office, oh the horrors, and somebody says ‘time to production’? We’ll explore the question why security research matters, where the ideas come from and what motivates a proof of concept, and the big question: What comes after? Finding the bug is nice, but have you ever tried to patch 900 machines on a Friday night? Ever wondered how mitigations make it into a compiler, or how a machine learning model rolls to production? We’ll look at why research matters and explore what makes it significant. Marion is a security engineer at a large cloud provider and enjoys reverse engineering and all things binary analysis. With some background in malware analysis, incident response and microarchitecture security, her interests are quite varied. In 2015 Marion founded BlackHoodie, a series of hacker bootcamps which successfully attracts more women to the security industry. --- BSides Portland is a tax-exempt charitable 501(c)(3) organization founded with the mission to cultivate the Pacific Northwest information security and hacking community by creating local inclusive opportunities for learning, networking, collaboration, and teaching. bsidespdx.org

Show transcript [en]

[Music] um welcome good morning welcome to my keot my name is Marin M I'm a security engineer these days and today I'll be talking about why we do research and I'm sure that most of you in the room have gone down a rabbit hole or other found a bug analyzed a piece of malor got really uh invested in the subject and I know exactly how that feels because I was one of you at some point until I took a job as a security engineer now I'm on the side of this industry where my job is to fix things to roll out detections to build models to manage patches to you know make things secure and I look back

at my time in research I miss it and I like what the hell people like could you stop finding bugs and like publishing vulnerabilities on Friday afternoon you know very well there are some of us who spent the weekend and you know Fring over a server that you know somewhere on the other side of the planet that needs a patch that you can't get to anyway so today we'll be talking about both of those sides like what makes it fun to to research and what's the pain of the research as well a little bit about myself as mentioned I've been in this industry for 15 years I do not know how Beamers work but I know um other things

I've worked for a long time in mware detection I'm a RIS engineer by training I've done uh I've worked in antivirus it's a fun field do not recommend um I've worked in incident response I've been one of those Ghostbusters going into hospitals trying to figure out why ransomware took down their entire system that's exciting I can very much recommend that field of work um I've also worked in offensive security a few years at Intel I'm sure that there is a good portion of the room who people who also work for Intel hi welcome miss you guys um and finally these days I work for a large cloud provider and because that's a large company I need to tell

you that I do not speak for my employer and opinions presented here are my own yeah so these days of work in security engineering I'm one of those Engineers that works with the research that people put out and let me tell you we have the wildest ideas of how those people look like and also let me ask you have you ever had this conversation with your grandma when she asked you like what do you really do in your line of work and you had to describe yeah I'm sure that's what she thought about after um there's a long list of really funny stock photos about security research on the internet I thoroughly commend go and check those

out um really though what does the security research do mostly research has come in with this curiosity like you want to take the thing apart and figure out how it's working that's my my my my my Spiel like I'm not really an exploit developer I just want to look at the thing very closely into a lot of people have works that's what gets me excited what gets other researchers excited is to break the thing so there's this new security camera like can we make the camera I don't know capture Disney characters instead of people wouldn't that be an interesting research feeling like people smile and notd like yeah that's what we want to do does it have any purpose

probably not but it's fun right um I also have this like very big very important graphic that's probably the most colorful slide I here that's all the words I could think about when I think about security like what's the different fields that people work on and yeah there's exploit development there's Mel development Al there's fishing but then also there's stuff like gdpr has anybody in the room not heard of gdpr I spent months and months of my life like worrying about gdpr there's also ufi security there's a nomal detection back byes there's ser the markets yes there's people writing exploits to sell them to other people to break other people's devices we don't really know who that would be um it's a

very dubious uh field of work but like there's this big cloud of um subjects that we can work on either go break things or go fix things and uh research is at the very base of most of these fields no if not all of them like somebody needs to find the bug so there can be an OD exploit then somebody either writes a patch for or you know sells to a vendor of of these things and um yeah that's what we're going to be talking about today uh why do we do research again there's rabbit holes there's something called hyperfocus I don't know how many people in the room have heard of add I'm I'm I'm one of

those people who can um go down rabbit hole and like spend day and night on a subject forgetting to eat forgetting to shower forgetting that there's other people in my life and it's wonderful gred it sounds awful but um that's the joy of the unknown I'm a r engineer I can grab a binary I can spend a week on that binary and forget whatever else is going on in my life because it's so much fun to like dig around and find things out um if we go down that list of what the goal of research might be does it really make the world a better place just because I understand how the randomware works probably not but there

is like ways to take the research to take those insights and actually have positive impact on society so theoretically like we research a thing we analyze the malware we find the bug we write the exploit and eventually maybe there's less vulnerabilities there's less malware infections because like right we analyzed the malware we came up with a signature we roll that signature out to customers if we work in antivirus like threat detection of a threat detection product and eventually there's more security or is there let's have a close look um what does security research even mean long long time ago I was at University and I had the science class they they didn't have us do science but

they tried to teach us what science means like find a problem that hasn't been solved before ask yourself a question that hasn't been answered and come up with a plan of how to come up with answers to that question something along it's been a long time since I've been at school but that's the idea yeah right that's what research does um in security research people don't necessarily have a problem or question they ask themselves they like they're curious there's curiosity and nothing else like we I don't know go and dump the firmware of your security camera and try to figure out whether you can find a bug in there and exploited um and that involves a process that's

rather fuzzy like few of us really follow a structured research progress and that's is not not all of us I know some people in the room look be wild that yes there is real science in security research most of security research we do these days is not actual science but that doesn't make it less important what's interesting there um most people looking into problems in that field they have this try fail rinse repeat process and that I think is the best way to describe how this research works like you keep trying until you succeed and then great things happen what are these great things that could happen um I wanted to introduce some research results in this talk that

I found very cool um because there's actual real world impact that security research can have so yeah you find a bug you write a proof concept the developer picks up your report fixes the bug rolls out the patch suddenly that bug isn't there anymore wouldn't that be nice that's not always how it works um but anyway so theoretically wherever there's computer chips there should be hackers right so take uh locks on cars people have hacked car blcks I think I hope somebody patched those problems um people have hacked pacemakers there's been legislation in that field to try to improve the security of pacemakers that's an important thing there has been legislation um in iot security so iot is a different subject

I'll be talking about it a little bit later it's it's a mess but people try and um yeah big points for that but overall if we look at um what I said earlier that security research is at the base of any change that happens in the security field somebody needs to have analyzed the problem first before somebody can come up with a with a bigger solution and so there's many many targets out there and there isn't as many people of us I wish there were um but so somebody looked at pacemakers and this is the first example that I want to present today like security researchers found the vulnerability in a pacemaker product and the FDA stepped in and said

like this could actually harm people like this is a product that could kill people if a hack of that product would happen and so they pushed the vendor to patch their their product which is interesting you would imagine that like if your R fir has a bug somebody would step in and push that vendor you know to patch it's unfortunately not not always how it happens but in this specific case yeah Abott was the producer of that P product and they had to patch the 500,000 pacemakers that were out there that were vulnerable the headline of this article unfortunately is misleading like there was no recall of those products cuz like can you imagine how how do you do a recall of

pakers yeah I don't know who wrote that um that didn't happen but like it it Le us to a very real issue right there's pervasive Computing happening these days there's F in places where you wouldn't expect F to run like in people's bodies so how do you deal with if if that bug were not patchable remotely what would we do 500,000 um operations like remove the pacemaker put another one in there it gets very tricky if you look at these type of these type of fields very interesting to work in as a researcher but like there's there's big power in your hands if you find that bu that's unpatchable inside people people's bodies let's move

on with the scary hacks um 2015 was a long time ago this article is from 2015 some of you might remember others might have still been in school um in 2015 um researchers Charlie Miller and Chris fasac hacked a Jeep from their home like over the Internet while the jeep was on the highway with the journalist in it they hacked the Jeep Dem modified the entertainment system they changed the music they played in the radio they messed with the transmission and they pushed the brakes and they were able to modify the steering of that sheep I think about that that's scary that screams for legislation like somebody needs to step in and like push car manufacturers to patch those issues

and with that specific case that that was what happened as Congress stepped in and proposed legislation I I didn't go and read it to see if it makes sense if it actually fixes the problem but so there was movement happening and I in general you would think that if you have have products that can't fail because Humanity needs them like a pacemaker like a car on the highway that there would be this this form of action that vendors would step in and like yes we need to fix our stuff um if we move on to the next subject that I picked like power grids um if you ever looked at at substation security that is

unfortunately not not the case everywhere like we need power right and that power is controlled by computers like so many things these days and those computers are unfortunately not as safe as we we wish they were um there is ample reseearch out there there have been actual hacks on power grids I think that was 2016 where Russian hackers took down a big part of Ukraine's power grid if I remember that article correctly um the threat is real I haven't found a lot of information about whether anything's being done about it I'm sorry I'm not a hardware hacker in that sense like I don't know much about that field but I know people who work there and they are

scared so I am scared with them there's one thing to say about hacking a power grid um we we've watched the Ukraine war play out that don't want to go into deps in that subject in any way but from the Cyber perspective we expected the Cyber War to go with the physical War you know what happened not much um taking down a power grid like through hacker activity is temporary that is fixable you can put another server you can change the chip you can put a patch you can update the software um if you drop a bomb on the substation the substation not coming back that's simply what happened when we looked at this this Ukraine case like

taking out a power grid through hacker means it's temporary and this is a subject where somebody's going to come after you like if you go and turn the lights off for a country uh they going to have questions and so um reading that article that I posted here that seems to be what people working in that field expect is that somebody could take that on the power grid there just isn't a lot of motivation if there's not actual conflict going on like if you're a kid in some basement and go out for the US power grid like you're going to get in trouble and from this article I understood that that's why the grid is still largely

untouched I'm not sure that's correct but um let's hope for the best that a bad infrastructure finally plane security question mark um so that will go if you can hack a car by the way that that that car hack um happen through a piece of software that's called U connect if you have a modern car you might have seen you connect boot up when you start the car that's where the bug was at you connect allow the atackers to control the car remotely which means they were sitting on a couch at home like with a laptop on the internet knowing the IP address of the car being able to get remote code execution in the

econnect component and the entertainment system through there on the canvas and through there to other components in the car now what about planes um turns out planes are hard to research because you need a plane to do the research so that's why there wasn't there isn't a lot about that published but it just so happened that the there is a handful of hackers they came by a junkyard of decommissioned airplanes and somehow they talked somebody I don't know who would have control over that into getting access to those airplanes and doing security research so they were looking at those airplanes like all right security by obscurity only go so far right what they found there though what

this article describe was that they found problems in the entertainment system remember in the Jeep the entertainment system was how people got in but it just so turned out that in those airplanes the entertainment system was 100% separate from the control system in the airplane that gives me a little bit of piece of mind also I happen to know a researcher that works for a large airplane manufacturer and he confirmed that that's exactly what they do there's physical separation between what a passenger can access from their seat the entertainment system and what Pilots use to control the airplane all right peace of mind um we all spent enough time in security to know that that doesn't

always work like it's a big complex system there's a lot of connections and like yeah physical separation is great as long as it stays physically separated so there's that um there was no big airplane hack today yet also like I gu me back to the power grid um if you hacked the airplane then what you going to make it Fly rounds in the sky or like I don't really know what a hacker would do with a with a hacked airplane other than causing real big trouble most of us don't actually want to go and crash airplanes right right but also um there is great research out there with great impact and then there's a lot

of research that goes nowhere and I've intentionally left the slide blank like research is fun I've been there I've done a lot of research that went nowhere jeez years ago when I was at Intel um I was trying to put malare into sgx enclaves you know what it didn't quite work like it didn't make sense nobody's going to launch melur in inside of an enclave and but I did it that was fun and um it went nowhere and like with nowhere I mean that there wasn't actually any impact now does every piece of research have to change the world God now hope let's let's hope not like we wouldn't be doing research anymore right there was a requirement that everything

we do needs to go to production summer become a product become a legislation like teach people to use safer passwords uh that's just simply not going to happen but there's some balance like there's a lot of research that um has been fun it's been published it is out there and then it comes to security engineer who picks it up looks at it and say like okay this is not helping us let's move on the next p and in doing so that took up some time it took up brain energy and honestly for me it takes up my will to keep going as a security engineer because there's just so much of it that's it that is one of the core

topics of this talk I'm 20 minutes in and I'm finally getting there there's research that is helpful that is useful that is interesting and then there's reseearch that says impact and the impact is what I want to be talking about today so cool we found the problem we found the bug we analyzed the malware we um report and then what should happen is that somebody mitigates so say we have somebody who writes a patch and then we automatically roll that patch out to all our thousands of machines that are globally distributed in our corporate network doesn't that sound great that is an entire field of work in itself I'm so happy I am not working in vulnerable

ility management it is a pain because there's so many vulnerabilities there's so many patches and sometimes the version is mismatch and like sometimes you don't really know what version of a software you're running and like which patch you need to roll out where and um there's people like who developed products to help um the vulnerability management people to actually deal with that wealth vulnerabilities and patches and like interestingly talking to those folks one of the biggest issue that you have is similar to what the challenge that we fac in threat intelligence is that the information they get is too chaotic like getting the vulnerability descriptions from different vendors or like different sources means that you

have different types of descriptions that then you parse into place so you have one feed of vulnerability descriptions that you can work with that you can roll out to your customers and that they can use to like protect themselves and it's interesting how do we get there like a minute ago we were researching a BG and we found vulnerability and that was all fun and suddenly there's like this blot of information coming into vulnerability management database and like people spend all day like sorting out versions of software that's not what we're supposed to do but this is the the reality and the same goes for other subjects that on the slide so like Mal detection I spend a lot of a

lot of time in my career in malware detection we have the malware we create did the signature we put the signature in some database and then magically that lands with customer machines in time to protect them from this new wave of Mel that's come out and like it's not not bypassable because the signature is so good that he detects all of that malware also it doesn't detect any Bine software at the same time that's magic that is hard engineering work it was like it is frustrating to some extent like it's a never ending job we have Automation in place nowadays by the way to like create those signatures like if you have malare signatures for an an

system there isn't like 500 analysts sitting there but like it's notom the B and there's a whole Machinery that goes into this automatic extraction of signatures and testing of signatures and beta testing of signatures and the customer say and then rolling it out actually and then the recall process once the signature fails because there's false positives yeah there's M there's a whole industry that does that and pretty much I think said you do other things too but like that's the bread and butter and it's hard so there was a little piece of what we research and then there's this throng of engineering that we need to do in order to get the protection out there's other things like intrusion

detection L detection they all follow the same pattern like the research is the research simple not really right like reverse engineering the malare is hard finding the bug is hard writing a proof of concept is hard building a proof of concept machine learning model to do anomaly detection on a sample data site is hard it's hard work but it gets harder after you did the research and try to take that into production to build a viable system to protect somebody so will we mitigate after we did the research mitigations um I already just put the slide here so you see the wealth of exploit mitigations we have have these days um in a beautiful short stint at

Intel I spent some time prototyping mitigations um can't say too much details about that too but it was exciting I learned how compiles work I learned how mitigations are built and they learn that they're difficult like coming up with a mitigation that's not like fixing a bug but fixing a bug class is really hard cuz like you look at at so many different pieces of software you look at so many different aspects of those binaries where you could place the mitigation you be thinking about whether where you put the mitigation is really a concern for that type of exploit for that that type of bug or not because every mitigation you build has a performance

impact like we can't like endlessly keep stacking mitigations on top of mitigations top of mitigations cuz like eventually the program doesn't run anymore because it has do so many a so yeah designing those things is difficult um do they work oh yeah like most of these mitigations have a lot of impact can they be bypassed oh yeah that is one issue the other problem is like they need to go places where they actually prevent an exploit from happening if you follow me down memory lane for a second there was a paper that's called smashing the stack for Fun and Profit that's one of the papers one of the first papers I read when I entered this industry because it sounded

intriguing and it was it's really well written it explains exactly how a buffer overflow bug is exploited It Was Written 27 years ago that's a long time some of you might not have been around then and you know what it doesn't matter because the problem's still there and 2023 I I just randomly looked at buffal exploits that came out recently and it was this one um Citrix net scaler which is not a small product by the way um had a textbook example of a stack based buffer overflow that was exploitable in 2023 that is like a year ago I'm sure there's newer examples like it didn't look for very long and that both entertaining and very concerning like

have you guys heard of non-executable stacks the stack was executable isn't that like Dreamland if you start learning how to write exploits that's what you want to find and there's plenty of applications out there like plenty of environments out there but this is still the case so like yeah through security research we have gotten all these mitigations going we're like they don't help anybody if they don't go places where they should be and this is where a security engineer comes in this is where product review comes in and like they look at this and say like hey guys so there's this application that's running in this specific environment we need exploit mitigations in there because there's some way for a

customer to interact with that environment like an ater if if we that in Security review We would like recommend to do these things whether they're done afterwards or not that's a different question um but yeah like theoretically we have the solution for these issues just yeah need to get them places another fun thing that I found with regards to this topic is like there's not companies out there paying money if you find bypasses for their mitigations that in itself is like another sport in this industry like um I'm not an exploit developer I have written one or two in the course of my career and I from what I remember the biggest issue was not like to trigger

the buck but to bypass all the different mitigations and issues that existed in the environment but like if you want an aslr bypass for your current Linux system got to GitHub somebody will have published something that could work for your specific version that was my experience that's what I it's like I never wrote a mitigation bypass there's a lot of copy pasting going on but like also like yeah people that's that's their bread and butter to research how to bypass these mitigations and it becomes a an arms race a very slow arms race by the way so like people develop mitigations and then people develop bypasses and then we develop more mitigations and then we we fix the next

bypass and so on so forth um it is wild and one of my favorite examples in that field and please don't throw tomatoes at MEA is the hardware vulnerabilities that were found in the Intel platform there's a long list of those we have an even longer list of mitigations for those vulnerabilities and they tear the performance of your server down by 5 to 30% and in some cases up to 39% and that's a lot of money um I didn't want to go into detail with these vulnerabilities I can imagine there's people in the room who have emotions attached to those I've been at Intel at the time when they were discovered I was not involved in working

on those but I saw people like being very worked up about these specific cases and um yeah when I worked on mitigation prototyping this was the field that was working in I'm I'm I'm not to be blamed for the mitigations that are in production right now like I didn't actually develop production mitigations but like the research behind it I was there it is complicated it is hard that's why we have like I forgot how many like there is a whole list and um up to Linux criminal U like anything less than 5113 if you wanted to disable any or all of those mitigations you have this long list of mitigations that you need to put

in um at boot time for those who not actually protect your computer from vulnerabilities that are really really hard to exploit like we haven't seen I haven't found notice of like in the wild exploitation of any of these vulnerabilities they work in the lab this L of pro Concepts um somebody might have uploaded an example exploit for one of them to to last year or the year before I forgot so that's out there if you want to go um try but like if you compare the impact that fishing and ransomware have had on the industry and then you look at the Intel Hardware vulnerabilities I wish there was a mitigation for fishing we don't have

medications for fishing but anyway what I found entertaining like um few weeks [Music] ago my team at work got a presentation from a researcher from Academia who proposed an approach to detect spectr exploits because the mitigations are so painful that like wouldn't be rather detect exploitation of this BG than like mitigate against them it's it's a design discussion like yeah we can ask those questions but what I found funny is that he proposed he he he showed us that now if you want to turn all the CPU vulnerability mitigations off in your Linux system you only need one key cuz like with konel 5113 they optimize and you only have to put mitigations off which I'm

like it it becomes entertaining at this point right like how many more mitigations can we have now we have like special keys just to turn them off which gets me to a different subject security and usability usability is a great word in security engineering researchers don't like to hear it um if we have all these mitigations somebody needs to go and threat model whether that mitigation makes sense in the context where they're running that means if your server can't access a website do we really need a mitigation for vulnerability that can only be triggered through a web browser these types of questions like not every mitigation makes sense in every environment but Al like let's get back

to that Citrix application they could have used a mitigation in their environment and they didn't because somebody decided it shouldn't go there so it's a it's a trade-off and that's why we need security Engineers who have skills in both the the research and the engineering uh site to make these decisions and they do them well also there's this big and flowy word of secure configuration by default on the slide um a lot of these problems that I've listed here I've I'm working on some of those myself and they're hard they're difficult like you have teams and teams of people who look at that of like so we want to protect this environment can we do this without

having people have a degree in security like can we have an administrator click a button turn on protection everywhere get the information out of the system that they need to see can the the information be self-explanatory can they look at the dashboard and they know that the company is secure not and you go to RSA you find a million startups who do exactly that like they sell you those dashboards they're really into those dashboards building those dashboards is hard coming up with a way to show metrics to a user that tells them the mitigations are where they need to be they're mitigating the attacks that are coming in threat detection is where it needs to be it is finding threats it is

not finding benind software like if you look at all these different pieces of information that you want to have if you want to know your organization secure it gets really complicated and I said a lot in this talk like I I wish I could go into detail about all these different items of like how do we build a self-explanatory dashboard which information goes on there how much of the information goes on there what time frame are we looking at what time frame is important um unfortunately I don't we don't have time to to go there today but I think about it it's if you were to Define what you need to see in order to

know that the organization is secure you'd spend some time there coming up with the right things to put on there and you'd probably still be wrong and you have iterations and go back and forth It's product development is essentially what it comes down to and a lot of security whether you build a security product or not is product development it's like we have a patch we roll it out to like 900 machines that are globally distributed that's an engineering problem it's not necessarily a security problem that gets me back to my favorite subject that I've never worked on which is vulnerability management um I'll admit like back in the day when I was in security research it was like so why

don't you just patch you know everything just like that's what we do right there's new vulnerability we go patched we move on with life it doesn't matter if there's like one bug or 50 and uh I I learned better um I now work for a large company we're globally distributed we have a lot of machines in like different geographical regions and um getting to all of those machines in a timely manner to deliver that piece of software that's going to protect them from whatever threat is going on right now it is it's hard and I keep saying things are hard um you have different operating systems like application you different Hardware if you roll out a hardware

mitigation sometimes the hardware mitigation is only needed on a certain type of CPU like a certain vision of the platform but not others so somebody needs to make those decisions of where that goes we need to know which platform is out there like that's a piece of information that ideally you have before the bug hits and not like during I've been in those cases where um it's a Friday night the is running everywhere in the organization and you ask the administrator so like how many computers do you guys have like how many workstations are we talking about and they were like oh I don't know we don't have a plan that how does your network

look like and they were like I organically grew over the past 20 years and if you're in the midst of this incident and you start writing up like which networks are we looking at which operating systems are we looking at it it gets a very it's going to be a very long night and a very long weekend and yeah then there's one more thing here that I really wanted to talk about is IND exploits um in recent years a new industry has emerged like prior to that we had OD exploits which is like the jewels right if you have zero they exploit you can exploit whichever system with that software out there cuz like they will work anywhere because

nobody has a patch if you have an NA exploit you can still exploit the systems that do not have a patch and it turns out there's a significant amount of systems out there that don't actually get those patches that we're talking about you say just patch it doesn't work in every environment for every person and so like any exploits have become their own little industry it's quite fascinating and exploits is also a great spot to start writing exploits if you're interested in that because imagine there is like already an exploit out there you might learn from and that you can you know develop as well so there's a great uh learning opportunity for us since he the

engineering it just means that like we need to be really really meticulous about where the patch is go and when it's a timing game as well let's go into something I understand a little better threat research I spent most of my career in threat research to be honest and ideally what you would expect from threat research is either threat intelligence like we look at the binary and we know the IP addresses and domains and like habits that the thing has and like we put that into a TI database and that TI goes out to customers whoever consumes that feed and then they can actively protect their system and interestingly that is a a research engineering um

relationship that works relatively well like it's relatively yeah there's issues granted but it's relatively straightforward detection development on the other side it's not that simple um granded threat intelligence isn't either like as I mentioned earlier with the vulnerability management if you have different data streams from different vendors different sources they almost never have the same format it's like you're ingesting data from different fields and then you have like an entire team that's working on normalizing that data into your feed structure that you can use to to use on your um on your products but let's talk about detection development for a second so back in antivirus I was one of those people like creating signatures I looked

at the binary and said this is an interesting function that's where you're going to get the signature from then we put that in a database and that database like automatically rolls it out the customers and tomorrow I me tomorrow like within a couple hours customers are protected against this piece of M and it's easy right except it's not gets me to a different point have you guys heard of or really that's a new public publisher adjacent to O'Reilly they are right about the really really interesting subject um detections how do you build a go threat detection you want to detect the threat which first means you have to have the thread to detect it which is an

interesting issue like can we write a signature to detect the malware that we don't have it's hard like there is people working in that space I've I've looked at that in the past there is options but it gets really interesting like we have to have the threat otherwise we can't build thre detection in most cases we have to have high coverage it's like Mel has variance there is like a binary that we need to detect but like ideally we cover the whole family so we're talking about coverage do we detect like five samples or 500 or do we detect all 5,000 samples of that specific strain of I don't know the new new D Spot that came out so there's like

this issue of like how much can we while also not detecting false positives and averting a a little bit like I'm not necessarily talking about signatures there's so many ways in how to detect threats these days that are not bite patterns that we extract from the binary but for the sake of simp Simplicity let's stick with the bite patterns so a false positive means that your signature detected the thing that is not supposed to detect we probably all know that so suddenly you detect I don't know where do exe instead of D malware which is pretty bad because that's an application that's out with millions of people and that would be called like huge BS

positives that be very embarrassing and and bad like by the way a long time ago when I was an antivirus I was just just about too young to enter the industry after the time when antivirus detected and quarantined explored the DXE let's say the internet the the industry like it wasn't a antivirus company that did that there was a number of them that had those types of L positives which say in a different subject like remediation you detected the malware do you really want to stop the process for those who haven't tried like if you kill Explorer the DXE like especially I don't know if that's still the case in Windows 10 m and older

Windows platforms that like killed the desktop so users sitting there and they see a blank desktop there's no more icons there's no bar in the the bottom there there's like nothing there and that's yeah restoring that is quite hard anyway so we don't want false positives we wanted detect all the malware but not the benign stuff we need to be cost effective like we can have perfect threat detection but like people don't want to pay the money that it costs us to roll that out um that's a problem that comes in especially if we talk about machine learning models to detect threats like yeah we we have great mathematicians we can build big models we can feed them all the data they can

detect threads but like how much is the customer going to pay per month to use this really sophisticated model to detect all the threats it's a different question um detections need to be prioritized back in the day when I started in Industry we had malware and that was it so like here's the detection do with it what you want these days where we operate detections like in a large environment then needs to be prioritization right there's threats that aren't that important like if you have an edware that's installed in somebody's browser you don't want to mobilize your security engineering team to go there and like smash that computer de that's probably not going to help

anybody if you have ransomware in somebody system you probably want to move faster if you have this really intricate thr word like AP mware that's running somewhere on your mail server you probably want to react real fast and look at that and get it out of your network so with priorities we're talking about resources we're talking about where people spend their times which is something if you build threat detection you really need to think about like we can't spam the customer even if all the threat detections are correct like if they get 50 of them a day and they have two security Engineers they cannot do nothing like they can't actually react so in terms of like how to help that

customer be secure we need to be able to tell them like what to do first what to prioritize and what not so yeah when I keep saying it's complicated this is what I'm talking about all these are Big engineering problem problems and like most of this does not involve reverse engineering the malware which is what I love to do like that's what really excites me to spend the time there but that's not really what the customers need in that that sense the customers like people we're talking about people right if you want to stop M from running on people's phones and like stealing their photos or like listening in on their phone calls that's what

we're talking about we're actually protecting individuals from being abused not another word about machine learning for security let's get back to the research there's a lot of research in that field I mean a lot it's a great field of research because there's no perfect solution if you ever looked at like how much research is published in the different fields the harder the solution is like the harder the problem is the more papers we have because the papers like they're not expected to get to the the point just my personal frustration um in terms of threat detection that's that's where I worked we have machine learning in so many different fields now um most of them I know nothing about but

machine learning in threat detection it's an interesting field there is a stack of papers out there and I don't want to bash anyone specifically again like there's value in every paper that's published I am sure but then there's a stack of papers they use like the last 50,000 malware samples they could collect from virus total that is their sample set that's what they use for malware then they have benign samples they also grabbed from virus total just those had zero detections of virus total for sure that's Bine and that's the base data they use for the classification algorithm there's a number of problems in this we'll get back to in a minute the next problem with most of that

research is that they don't work on large data there's some some challenge there if you work for if you're a student at PSU let's pick that subject you don't have access to customer environments as we call them we you don't have access to like the real applications that run on web servers you have access to virus total so that's what you're going to use and that is perfectly fine that is great for your research but then like if you look at that from a product perspective and look at those papers most of them don't translate in something that can be put in production of course like how would it it's clear but for us it means we're

waiting through like the stack of papers to find the one idea that can fill in the gap of what we're developing um that helps us actually build a model that runs like distributed in geographical regions like on millions of customers environments and can deal with a load of data I specifically posted this paper on on the slide not to call out the people on there like the paper is great they proposed a mechanism to classify malare based on bite sequences and executable files and they found a way to do this more efficiently faster and with more accuracy great um in in in M classification using machine learning you will recognize very fast your biggest problem is there is no

parsers there is parsers you you can extract features from malware in different ways you can disassemble them you can get instructions you can get strings out of them you can get different rep presentations of that bite data of the M but that always sends you down a rabbit hole of trying to fix a person like taking a binary and putting it in a different representation to get the SC strings out that's a relatively simple one but like so you want to classify call graphs great now you need a disassembler and you need a way to get the call graph out of that disassembly like you need to find out which function is connected to which function which

call goes where suddenly you look at indirect function calls and you're like whoops my call graph is rough because this application is full of indirect calls and I've gone on those rabbit holes and I realize that like parsers suck especially for binaries and [Music] um so looking at the B representation of files I believe is the way to go if you want to classify malicious files you don't get a ton of data there like you don't get any information about what you're looking at it's plain bit right but you can write a classifier pretty straightforward the problem there is like if you look at 50,000 files that takes a while if you look at five

million files that you need to pars and I don't know whichever limited amount of time you're going nowhere we tried we we got stuck because there isn't really a pure that can extract bite sequences from files at speed at this volume and also Mak sense there is a surprising amount of executables in Linux that are really really big and talking like 500 megabytes type of exes I don't know where those come from like who builds those but they exist and so like if you're talking about time to classify a file that could be anywhere between like a millisecond and 5 minutes for a production system that's not really a good frame of what to work so that

that's we're talk about skipping the large files because we can't classify them that's why Mel authors get really excited because they're like oh so if the fil is like bigger than 500 megabytes you don't classify great let's go there that gets us back into this c mouse game that we didn't want to enter by moving down on the bite level of the file and yeah going back and forth but like essentially what I meant to say here is like there's a lot of research for the sake of research that is good there's still important information coming out of there but like if we only do research for the S of research we don't move detections

anywhere yeah get into running out of time in reality again we have bigest problems we have adaptive conditions uncurated data sometimes a lot of it there's biases that we need to work with um malicious data points are hard to come by if you look at a large data set you'll spend a lot of time like just trying to figure out what it's about and then you still don't have the melbour in there or the exploit activity in there that gets you done a different Rabbit Hole of research which is really interesting trying to come up with like how can we get um representative data of the thing we're trying to detect so say we want to find web exploits like where

do we get all those web exploits from they don't happen every day right do we run our own exploits do we craft synthetic data that represents exploits the way we think the exploits will look like in a data which gets us on a problem like this is actually representative of the thing we want to detect so you see um lots more engineering problems it is fascinating and testing and production becomes very real but I won't go into detail there scaling um scaling is hard as well if you work in large systems it's important to do the simple things really really well and that takes a long time potentially and what's always also true is that the

corner cases always happen so you're writing that detection and you know that like one in a million times is going to be a false positive if you work in a system that's big enough you going to have a lot of false positives it's like the numbers change depending on the environment you're in and now that we did all this here's another nice article that I found that I wanted to put at the end of the presentation um now we have all this detection engineering and vulnerability management and we have machine learning we have ai U we can't get any smarter but we still have issues that are very much at the basic level like everybody here has

a home Rider do you know what firmware is running on there do you have antivirus on your home Rider do you do patching on there I I personally don't I'm sorry I'm sure lots of other people don't either which is interesting like denal service Bots are really not that cool to look at they're very simple but also they're very effective and nowadays apparently they're involved in armed conflicts which is strange strange times we live in so almost at the end of the Slate like are we solving the problem like any problem really um I I tend to say yes there is issues we' fixed like exploting a buffal flow on the fully patched Windows platforms these

days is not that easy anymore but making it harder we push that b up to a level where people either don't want or can't go there anymore and that's a step forward and I said this cautious ly like you seen there is enough buff exploit to light there maybe that's not a good example but like it's getting harder and harder for attackers um in the overall view of the system but it's like I talked about a lot of those cool research fields of like hey can we take down an airplane in midf flight from passenger seat that is nothing compared to like the problem that we have with ransomware like if you ever done incident response in the hospital while

every system was down because renaware was like creeping through the the wires and doctors didn't know what diseases their different patients had like what ailments they needed the pharmacy didn't know what what medications to order and they told us like we're running out of medications to feed to our patients like in 10 hours if the system is not back up like we don't know what to do suddenly your job becomes like very very uh stressful and Ransom is an interesting example because like we don't have a fix right we have antivirus we try to stay on top of that feeli like people developing randomware which I stay on top of those campaigns but like other

than that there isn't really a a switchy flip on your computer to say no more randomware like there's no good protection that's out there if somebody would like to like research that field and come up with something that'd be great that be that' be helpful simple similar thing with d spots and Bitcoin miners they're there's a lot of places where malicious miners are running like not the thing that you started on your Cloud instance but like some aters Bitcoin minor that might be one of the more prevalent pieces of malare these days and there's no good medication either they're really simple like they are not runtime packed they don't do application some of them come with big old banners like strings

that say hello I am I something something Bitcoin minor and you would think that that's detected and mitigated it's not yeah and there are some other um rather simple ways that are security problems these days like fishing it's an evergreen there's still companies trying to detect fishing emils and they're still failing I get fishing every day like I don't know what advances we had in that in that field not moving very far like um you pick a field and you'll find problems that are unsolved and I think that's why most of all of us are still employed security is still a problem for the and uh yeah let's keep going the conclusion I have of this talk

is like everything's difficult that doesn't help anybody right um research is hard research should be done research is at the basis of all these advances that we've seen in the field the randomware problem would be much worse if we didn't have anybody out there analyzing those binaries and coming up for detections I could you not fishing could be so much worse if we didn't have email filters like things happened in the past couple decades the field has gotten better and uh yeah it's all thanks to security research but what I try to say here is every step of the way is difficult the research is the most fun and the most exciting element like I thoroughly agree

like I like to do that too I like to get lost in the rabbit hole but from that rabbit hole there's still a long list of things that need to happen for that research to actually help anybody not every research needs to help someone some research should like the needle should move forward in terms of security slowly but steadily things need to happen is it your job to do that I I don't want to put that on you or me or anybody else but think about it the next time you find a bug how not to say like think about those poor security Engineers on a Friday night that get bombed with a report and then

you go patch um systems but like yeah um impact is nice isn't it so finally not to say without resarch we're not going nowhere if you guys stop finding bugs and stop writing Pro Concepts what are we going to patch patch man is is is a nightmare but like without the vulnerabilities that field wouldn't have anything to do either couldn't have an impact if they wanted to if we stopped finding and analyzing Mal we wouldn't have any more signatures what would we do there like couldn't protect anybody either so yeah and I quote that one of our senior Sciences scientists said at some point I like lot like if everything we try works out we're not trying hard

enough so like not every piece of usage can have a world changing impact and it doesn't have to I it thanks for your attention [Applause] [Music]

Why We Research

Related talks