← All talks

Operational Tech binaries and the tale of deductions

BSides Delhi · 202045:0853 viewsPublished 2020-11Watch on YouTube ↗
Speakers
Tags
StyleTalk
About this talk
Operational Tech binaries and the tale of deductions ! The proposal speaks of the tactics used to solve various problems while deducing the behaviour of binaries for Cyber Physical System's functional aspect. In this endeavour traits considered are: Strings, symbolic buffer/strings in memory, yara rules, functions of interest, import hashing, API call sequence, device probe, network probe etc. While the shallow approach explains strings in static form and symbolic buffers, we tried reach the depths of formal methods and symbolic execution to carve out inputs consumed by the functions of interest and deducing the behavioural traits out of it. Rushikesh D. Nandedkar Rushikesh Nandedkar is an engineer at FireEye Inc. His assignments have always been pointed towards reducing the state of insecurity for information. His research papers were accepted at NCACNS 2013, nullcon '14,'18 & '20, HITCON '14, Defcamp '14, BruCON '15 '16 '17 '18,'19 DEFCON 24, x33fcon '17, '18, '20, c0c0n-X '17, BSides Delhi '17, BlackHat USA '18, DEFCON 26, DEFCON 27, BlackHat USA 2019 + Co-author of DECEPTICON, an intelligent evil-twin, DARWIN (A parasite covert wireless network) and SASTRI: Plug and Play VM for SAST/*Static Application Security Testing Realtime Integration*/ . Being an avid CTF player, for him, solace is messing up with packets, frames, and shellcodes.
Show transcript [en]

good morning rishikesh good morning tom how are you i'm fine how are you doing yeah very good thank you refreshed and ready for another great day and uh good to see that you are ready wired up and ready to go uh russia talk is going to be on operational tech binaries and the tale of deductions uh uh i'm intrigued already uh so uh rush keth please bring up your presentation and we will get ready to go

while rishikesh is doing that please do not forget ask questions best way getting something out of the uh speakers the most intelligent question or perhaps just my favorite question will win this very t-shirt the i love billy b-sides deli i will guarantee it is washed and posted directly to you um so rishikesh are you ready to go yes okay let's get your presentation up on the screen i'm not i don't think it's been shared at the moment here we go oops nearly there we go okay rishikesh thank you very much please take it away okay uh grafton all uh okay today i will be i'd be sharing some of my experiences or the struggles per se uh

on on the uh on the topic which is more related to industrial control systems or operational technologies uh where when i basically started working on the uh thread collection or threat intelligence collection part of the operational technologies what all kind of the problems i uh tried to solve and how did i do that as far as reverse engineering or malware analysis of ics binaries is concerned uh i'd be discussing that so yeah so a basic introduction about myself i am a researcher at firearm indian threat intelligence uh division i i basically try to understand and learn more about networks wireless data link layer memory uh computer memory per se and formal methods i'm actually a big fan of linux kernel

and i i do small contributions into linux kernel as well [Music] so introduction what exactly is operational technology is our industry uh uh it's actually uh another name for industrial control systems or another name for cyber physical systems uh so ot basically is is most of the time in our context is that particular uh software automation which is deployed between the real production physical machines and other i.t systems of any company so when i say real production physical machines by that i mean uh it could be like uh uh you can say a sugar production industry could be a a power generation plant could be a nuclear reactor could be a dam control system could be a building control

system it could be such similar applications of computing uh technologies that we see everywhere uh and yeah for that matter it could even be a signal control systems as well so why why uh ics is important it is important because these are the systems which are directly controlling and impacting the human lives per se as well they and and when we talk of same thing in in terms of production a physical production system they are directly impacting revenue of that particular organization as well now another part like why why nowadays threats are more prominent onto onto these ot systems it is so because uh the software's being run on such systems or on such computers are

are kind of ancient honestly are have been developed in an era or by that mindset where they need to make work something uh security has never been taken in account in majority of the majority of such softwares so that is a big problem and being a low hanging fruit for bad guys they they usually try to approach such targets quickly and yes these devices or rather these systems these softwares have been left unpatched there is a reason for that one of the one of the most prominent reason is majority of these systems are producing a lot of lot of units per per minute and taking them down is directly proportional to losing that much amount of revenue from

from the pocket uh people actually i mean as everybody of us know security is more of a trade-off uh people uh usually opt for not patching the things and not taking down their uh systems per se ah okay yeah so uh when i started into trade collection of threat intelligence division of industrial control systems of fire eye i i had a lot of questions like uh eventually there were a lot of samples which i used to gather on virustotal from various sources from honeypots and whatnot but then these were the typical questions i had i had in my mind before going further for analysis so what makes the binary interesting for analysis uh this is basically like okay

uh so according to virustotal or maybe some another uh antivirus the binary may be shown as benign but is that still a benign thing or shall i wait for some time to see the malicious behavior or rather some other uh antivirus flagging it malicious no there has to be some something from my end to to basically deduce the behavior behavioral trade per se what does the binary do in legitimate mode of behavior this was one of the biggest problem biggest question i still am trying to answer uh it is so because as far as industrial control systems binaries are concerned they are not openly and easily available in the wild so getting hold of legitimate binary was

yet another uh you can say a struggle i had faced what is the malicious activity being done by this binary this was the trickiest part as well reason being majority of the times what antivirus or for our sandbox interprets as a malicious activity may not always be a malicious activity for ot binary reason being the the act say like you know reaching out to a particular network device or you know reaching out or rather you know opening up another uh privilege process or something uh which according to sandbox or an anti-virus could be an could be a malicious activity can actually be a normal activity for the for the ot binary so this was another

uh tricky question uh that usually pops up whenever there is a ics or ot binary and and the last question is is one of the essential one to be answered like what makes the binary and otr ics binary now this is this is the biggest this is the biggest thing or rather biggest challenge uh i i see here because most of the most of the uh bad binaries which are actually impacting or are creating a problem in operational tech environments are are actually affecting some another box and from there pivoting into ot setup having said that answering this particular question will probably cover most of uh or the rest of the talk here so uh

yeah before going further we must we first need to know like what exactly is a need or rather what what makes the legit binary important in this whole uh endeavor per se so uh the problem with uh finding legit binaries or related behaviors per se is there is almost no documentation available around around the ot binaries or uh or the executables or or the you can say protocols or whatever being developed specific to industrial control systems reason being these are close source things uh these are vendor-specific things they are basically bound to that specific hardware which is being sold and not freely available so everything comes under the under the end user license of that

particular license agreement of that particular manufacturer that makes it little more tricky to get hold of legit binaries yeah as i have already spoke of these are like specific to non-public devices uh when i say non-public devices these devices are tailored to very specific need and and it's been sold most of the time they are not freely available like our open source softwares a huge number of manufacturers and devices yes that is another big problem here uh so like uh for for for example say building control system there could be 10 different vendors that says 10 different manufacturers so those 10 different manufacturers may be producing i mean one each of those 10 different manufacturers would

be actually producing 100 different versions of the building control system devices depending upon the requirement and need of the environment so that makes it i mean that makes the scope of this whole ics setup a little more huge now having said that more manufacturers producing more devices they are producing more software for those devices hence more functions more code more secrets okay yeah challenges yeah i have been i have been actually speaking of most of the challenges itself so far as i already said these are not usually documented things uh even the user manuals which we get along with the devices or along with the code are pretty brief they do not speak of anything

uh uh related to their uh internal behavior uh cryptic behavior yes reason being many of the protocols and the many of the protocols and many of the things that are actually included in the devices or into the software are specific to the specific to the device itself and and and those things may or may not be possible by by a normally normally uh found or used uh you can say reverse engineering tools but that's not always the case um biggest another challenge is usually distributed in parts yes so even even if you get hold of uh an executable file you will not be trigger it completely in your debugger or in your environment because uh that they usually the big vendors

they do not uh share complete executable in a single uh in a single instance so there is like they they basically share executable in part so part one part two part three so these all parts has to be in place in a specific sequence then only the uh uh you can say anticipated functionality would be triggered so that's another big problem with ics or ot binaries and yes they are they are acutely expensive the devices and the software part as well so channels to acquire binaries uh yeah so malware sample aggregators like virustotal and other uh places discussion forums yes uh many of the times we get a lot of interesting samples from discussion forums and

in-house collection systems like honey pots and and other collection systems per se uh yeah so now let's start with the actual thing uh tests of deduction these are the these are the approaches we will be discussing in today's talk so first thing is static strings another is the strings in memory it's you can say it's an advanced version of statistics then there are yara rules we will be covering error rules at the last or we will be discussing couple of lines on that at the la at the end of uh session functions of interest then fuzzy hash or import hash api call sequences device probes network device probes and that's it so let's go to static strings so

static strings is something so in usually linux boxes there is a basic command called strings if we run that particular command on a sample it gives us pretty much of understanding of uh the the strings which are statically available into that particular sample so i have shown here an output of a particular sample which is actually a plc related functionality which actually is uh delivering a plc related functionality so you can see uh uh the the output in the output itself it is revealing the string in the red saying plc station number or value received from plc or something so let me quickly go to uh demo of this or rather i'll cover the dynamic string as well and then i'll

combine the demo for static and dynamic uh at the same time let me check if there are any questions guys please feel free to ask questions anytime you uh you have them or if you want to store it to the end of the session that would be awesome as well so the the thing is up to you you can decide on that okay sorry yeah so static strings demo we'll cover this soon dynamic strings so here uh here what we're trying to do is like we are trying to emulate that particular binary behavior into into a terminal and see like how how it it it might be uh producing a different set of strings apart from

apart from a static strings so if you see the output here of this particular screenshot you can see there are like one two three four five six entries and for the same binary when i when i ran uh dynamic string extractor i have couple of more entries like for say uh clear screen received error and received bad message from plc these two entries are a newer one for for the dynamic string that that says like you know even dynamic strings gives you more uh inputs about what exactly is uh being stored on on the fly or when the sample is being executed or loaded in memory so demo part demo time so i probably need to stop sharing this and

i'll start sharing my vm so give me a moment share screen sorry

okay

i probably need to share entire screen is that okay if i do that uh somebody from support can can you comment if it is okay to share complete screen

okay i'll i'll share complete screen no problem so let's go to my virtual machine here let's run strings

there's something open oh sure yeah so i'm running strings on a specific sample to check if something of interest pops up uh it's kind of huge i'm not sure if my string is visible can somebody please comment if my string is if my screen is visible

oh sorry in comments part i missed on to the question uh why are they in part so a reason for that is like basically this is their intellectual property and which is actually uh which is actually definitely they will try to they will try to kind of uh uh you know protect their for all possible ways licensing is one part and having them shipped in multiple parts will actually help them infringing into their intellectual property to some more extent so that's the reason uh does that answer your question sarthak okay so by the time he answers let me go to the virtual machine so just if somebody can confirm my virtual machine is visible to

to everybody to attendees that will help

okay so let's grab this output with say plc ah nothing shown up let's say modbus yeah so we can say a lot of instances of modbus could be seen here uh the the entries in red tells us okay this is a modbus binary and yes this is actually an ics related functionality so yes this thing is of interest for us uh so if i run a dynamic memory extractor tool onto this same binary let's see what output it generates okay so there are more modbus entries as well we can see here and let's go for a tool so i would be using floss here to to show you the dynamic memory uh extraction or dynamic string

extraction from memory so bf5 image awesome so it actually extracted little more data i'll try griping it with modbus i'm not sure if this would run yeah it did so you can see the lib modbus outcome which was earlier not there has popped up and couple of more entries here which are related to again uh modbus functionality have been mentioned here okay and check if there are any comments okay so uh are you still not able to see my uh terminal

i'm audible

now we can hi yeah we can now see your screen your uh virtual machine

no put it back to how it was not sure what's happening right now there that's that's now visible and we can see your your virtual machine oh

hi can you hear me hi rushikesh rishikesh can you hear me

russia rishikesh can you hear me okay so yes thank you for responding so these are two small demos so i have used floss floss tool here there is another tool if you want to do same thing on to windows there is a tool called strings.txt which actually extracts static as well as dynamic strings for you

okay so having said that let's move away from the demo and let me stop sharing the screen

okay awesome uh okay let's go back to presentation part yeah so those were two small demos uh another important thing is a thing is a functions of interest so like uh many times when we try to analyze the binary it's it's actually huge amount of effort uh but whether to decide on to analyze the binary or not it it can be done in a quickest quickest way possible so initial to test we have confirmed like from static strings and dynamic strings like yes there is something which is actually having a mention of plc or modbus for that matter but uh further going we just need to confirm whether that particular thing is of actually importance as far as industrial

control systems are concerned so for that uh we will be doing a bit of uh we'll be doing a bit of reverse engineering uh for that i would be using the draft for that and let me share full screen again okay entire screen yeah so on the same set of samples i'll be doing a bit of analysis using gedra so let me close all the instances perfect so so here it is so i'll use this particular sample to do the basic deduction uh it would require me to analyze the whole binary so let it run through this analysis part okay meanwhile if there are any questions we can always answer them because this would take at

least like 30 odd seconds meanwhile let me check if the

okay analysis is done let's go to functions yeah and i see some entry as a new modbus i don't know what that entry is i'll just go here and click on the entry to check what exactly it is so at least i got an understanding here that okay it is some function uh which is actually seeking a modbus port uh now now what i'll do is like i'll try decompiling that particular code it's actually decompiled here thanks to kidra but we can do it again if required so it is actually giving me complete understanding now saying okay this is uh this is some binary which is actually seeking some inputs and outputs from modbus related

functionality and that gives me like a complete confidence on this binary now okay this binary has to be analyzed going further so i i i eventually i actually purposely cut down onto most of the parts in the hydra analysis as well because that would be a kind of how to score for our talk here and just to maintain the interest of attendees in the talk but if you have questions on this we can always take those questions now or offline uh whichever way you guys prefer so this is uh so this is the function of interest like that there would be another function of interest but it would be tricky to find so let's go back to let's go back to the

presentation uh the demo have already shown you fuzzy hash and import hash another important and interesting thing i won't be covering demo here because that would take a lot of time uh to explain the demo i'll just explain you the brief uh of methodology so what here we are trying to do or what we did was like uh as we have seen in the functions of interest part like this particular function say new modbus outstation port function is actually uh uh is is used in this binary so uh we try to use the import hash to kind of calculate hash for that particular function and then comparing that particular hash with uh a pool of binaries and basically sort

them out okay these these binaries are having this particular function in common so let's have uh let's have this function analyzed i mean let's have this part this set of binaries analyzed uh dedicatedly api sequences uh not fully relevant here but in majority of scenarios i have found this thing relevant here uh so what exactly api sequences uh uh creates relevance here is uh whenever whenever a specific function is being invoked by the executable or a binary it actually calls or it actually calls a set of apis those set of apis have been found unique to specific behavior now this behavior can be compared to the analogous behavior of a malware so say for a specific

malware these these apis have been invoked in these these sequences so similar thing can be implemented here as well and we have had some evidences of such uh occurrences for which were specific to industrial control systems and i'm not talking of malware so the binaries which were specific to industrial control systems we had certain uh certain functions uh isolated and those functions were actually calling a very specific sequences of apis so that gave us another level of confidence okay this binary is specific to industrial control systems uh for example reading coils so yeah this thing so reading coils is actually a function we usually uh see in modbus related functions and a modbus related uh utilities so we will shortly see the

demo of like how we can reach to the reading coils not specifically for api sequences but we will see this uh device probes another very important uh aspect or rather i would say one of the critical aspect why critical because uh most of the software's written for industrial control systems uh functions they have something or some hardware deployed somewhere in either uh either in either the same box or in the local radio or local network periphery and eventually these uh devices they are capturing some data in form of sensing or in form of some other readings and they are sending it back to the software for calculation purpose for inference purpose for for computation purpose so uh

to to kind of you know check these probes i mean to basically connect to those devices there has to be certain remote procedure calls or some raw sockets or some type of mechanism in place in the software or in the ics binary which actually will be speaking to the device and will be fetching data from the device now having said that device probes play a crucial role here so let's see if there's demo yeah this demo so let me quickly spawn another okay let me check if i'm sharing whole screen yeah i'm sharing wall screen yeah so i'll shut down ketra here open up another shell yeah so so for this pro purpose i'll use

uh i'll use the tool called radare and let's analyze a binary using radare so i ran the binary and i'm trying analyze that whole binary or rather the radar is trying analyze whole binary

it can take a couple of seconds more meanwhile if there's questions no question okay awesome yeah the analysis is done so i'll try to check what all imports are there the list is huge so here i'll try use the basic behavior of any software for that matter so i try to find something which is opening file why opening file so f open the api or the system call for that matter in linux will actually try to open a file and we'll try to read or write over there so there would be an even question for from from you like why open file what what exactly is this trying to do so when we say we are trying to open a

file in case of operating systems uh or in case of kernel they are trying to read something from some location and for most of the operating systems everything is fine your device is a file your memory is a file your your uh your normal mp3 file is also a file so having said that there are two entries let me show you here so for open file there are two entries so i'll try to go to first entry and see what's there

so yeah i i could see there are more there are more uh there are more things available at that particular entry uh at at this particular thing so let's try to reach out to this particular address specifically so say seek this address and say pds and this will give me summary nothing interesting found so i'll go to another address here and i'll try to check see this one okay and then say pds and i got something interesting here so that says accessing coil register address field file so as i've already told you fee uh coils are usually um frequently accessed in the modbus protocol i have got an evidence okay the binary i'm trying to analyze is actually a

modbus binary and is having a functionality which is related to modbus so so this explains another thing into the device probe so you can easily see that from this error it is evident that this particular software is trying to access a particular hardware a particular device which as at present is not available in my system so it through an error saying okay accessing error accessing coil register address field high right so here i got an evidence of a device probe now similar to this there could be network probes as well like there could be uh you can say uh evidence is found using maybe wireshark or tcp dump if we run this particular binary and

before that if we trigger a wireshark so there would be an evident traffic can be monitored into wireshark saying okay uh this particular we can easily see that okay there is some packet actually trying to reach to so and so ip address or so and mac address or something like that but that gives us a clear evidence and idea about uh the device or the network network device pro okay so there's another slide on that so yeah there are embedded devices remote devices using traffic analysis we can easily check that thing not always reason being there could be some undocumented protocols on which this particular software is trying to reach to that particular device in such scenarios

uh traffic analysis may not thoroughly help but yeah for that we basically need to place hooks onto the binary and kind of you know get data from the context switches that probably would be out of scope for the talk here yara rules i purposely kept this slide at kind of you know later part of the discussion reason being whatever the methods we have discussed before can very well be uh converted into yara rules uh i tried doing that on individual uh test case basis and i found that there was a lot of noise generated this noise can very well be avoided using using a lot of fine tuning mechanisms using a very specific uh using a very specific type of

yara input things but yeah that requires a bit of patience um yeah for me it was initially little challenging but now over the period of time as as one is comfortable with writing error rules things go smooth so lessons learned failures and more failures and more failures and more failures why because again most of the things are not documented uh most of the efforts i'm doing i was really not aware whether they are going to lead to certain uh certain concrete uh uh output but yeah that's what research is so i'm fine with that there's no silver bullet to sort all types of ics or ot binaries in one particular go no that's that's not

possible like uh like what we have seen for a modbus and plcs uh there would be something else for ic104 there would be something else for profinet there would be something for a heart so uh not everything could be uh combined into one uh research effort and could be used to you know sort ics binaries from a normal binding so yeah that's that's the problem results yeah so for for this uh approaches that i have shared with you earlier automation is pretty much possible for for individual technology uh we tried achieve uh automation in that part could not do complete automation but yeah for for most of the things uh we have reduced almost like

40 percent uh effort overhead uh analysis was expediated because we already had a assorted set of binaries in place so that actually helped a lot in in expediating analysis help reducing clutter as well uh let me mention one more thing we were able to reduce the glitter we were not able to remove the clutter so yeah there were there was still certain false positives we got into the sorted samples so yeah so the clutter was not completely removed all in all we were able to set up a benchmark to sort uh operational technology binaries or ics binaries for further investigation and [Music] yeah that was kind of a benchmark we were able to set up there

uh okay so that's most of my talk now uh bibliography these are some of the resources i have used to prepare the presentation for my work there are there are more resources if somebody of you is interested please reach out i'll share all the resources with you guys uh so now is the question and answer time if there is any question i would be really happy to answer if i know the answer excellent thank you stop sharing the screen rishikesh hello can you hear me hello russia cash tom you're not audible if you're speaking to me i am speaking to you uh am i audible to everyone else yeah i know you already but okay great

great thank you rishikesh much appreciated if you could put your uh resources and your um etc into the slack channel that would be perfect and then everybody will have somewhere to to find it yeah i'll positively do that fantastic thank you you also answered your questions as you went along so we don't have any extra questions the only thing i would say is i'm glad to see that the uh demo gods are still alive and well in uh in india um yeah there's never a conference that goes by where a demo doesn't go smoothly you know so um yeah very good very good rishikesh uh thank you very much indeed for your talk uh very interesting thank

you for elaborating on everything and uh yes if anybody would like to follow up with rashikesh please uh jump into the slack channels and hopefully rishikesh will be there as well yeah i thought before before concluding let me share last slide where i have actually mentioned my email and twitter so in case somebody want to reach me uh on on those channels that will help uh if that's okay yeah i think sure so if if the magicians in the back could just take a quick look at that that'd be great if you get your screen up yeah should be here and rishikesh i must say i find your choice of t-shirts very very good i'm

really impressed inside 2017 excellent stuff glad to see yeah first edition i was fortunate to attend that yeah yeah yeah i've been to them all i'm glad to say in one form or another so but hey anyway here we go um here's uh rishikesh yeah sorry excuse me rushikesh's uh email and twitter folks so please obviously a lot of knowledge and talent to tap into there reach out to him on here brilliant and that's excellent exit look at that brilliant thank you rishikesh really appreciate it