
hello
good morning security fans um welcome to the rumsferd room my name is jenesco and i will be the mc for the first two talks today and first up uh well let's let's talk about the first talk so we can get rolling um is everything in order yeah everything good okay so as security professionals we always know that there has to be a balance between security and usability and the first talk today explores the challenge of usability within the space of multi-factor authentication now we all know that multi-factor authentication is one of the best ways to prevent an attacker from being able to exploit just a simple password system but you have to think about all of the different
ways that you can do multi-factor right you have hardware tokens you have apps on your phones you have actual sms but there are problems with that there's a definite usability issue and so today you're going to hear a story of how when usability met to fa and i'm going to have to read the list of the authors for this because it was a team effort for this it's so it's going to be hyun su kim juno li ki hong heo sang kyo cha em yong gyeonshin this is a group from kais which is the korean advanced institute of science and technology so sit back relax and hear the story hey thanks for the great introductions
can you hear me can you hear me i see great thanks so again thanks for the introductions and this is sensu kim from christ in this talk i'll present about how hard it is to strike a balance between usability and security mainly in two-factor authentication the title is when usability meant to fa and this is a joint work with a group of our team and we are all from christ and if you heard of kaist that's great otherwise uh let us begin with the introducing ourselves so we're currently in korea far away from munich as shown on the map in korea and munich are about 13 hours far away in a single hub flight so considering ourselves as a physically
distant party we really appreciate having a chance to present in the event online and the picture is an overview of our campus where you can see many buildings and grasses this is a picture of blossoms in full bloom in the campus last spring and this is the building that we are currently in with uh greenland yeah and kais is one of the leading research institute in korea so the name stands for korea advanced institute of science and technology the qs world ranking of christ is 41 which is moderately high and especially in computer security kais marks the fourth in asia and 35th in the world so that being said one may imagine how tempting it is to launch a cyber attacks
towards kaist as there are a bunch of classified experimental results and leading-edge research in progress indeed there has been and still happening attempts to exploit the id service of cheist and some black hats hackers managed to leak the privacy information from libraries electronic research note system from the attack more than 30 thousand members personal information were put in jeopardy the incident was considered to be serious security bridge and the kaiser administration decided to enhance the security overall so through through the postmortem kais decided to adopt two-factor authentication that is the well-known authentication mechanism that requires two proofs to identify a single user i'm pretty sure that most of you have may have seen and already experienced
how it works when you're using service from maybe google or github the authentication mechanism basically requires two distinct proofs where one is what the user knows such as password or pen number and the other proof shall be what the user physically possesses for example your mobile device bank card or even a usb stick while it seems quite trivial to adopt 2fa mechanism throughout the institute the problem was to consider the usability of 2fa at the same time since cass members include aged professors who usually are not familiar with complicated procedures to log into a certain simple service the fact is that they think toefe comes with no good and merely harms the usability the security team in kais had to come up
with a secure 2fa that is also convenient to use at the same time so here's how it works so here we demonstrate how the invented 2fa system works and they call it easy authentication as marked with red box and all you need to do is type your id which is for example beside munich works with rocks which is obvious then click the button so the pop-up window shows up saying the authentication is in progress then a push alarm arrives to the user's mobile device saying attempt for the easy authentication is being detected if you click on the push alarm the custom app is launched and ask for the face id or touch id immediately if success the user is
authenticated and yeah that that is all for demonstrating how the system works and i'm pretty sure most of you may have already found out how sloppy the system is being designed then about several months after the suspicious 2fa was launched in kaist graduate school of information security there was an exciting course going on called binary code analysis and secure software systems as the name of the course indicates students were expected to learn binary level program analysis techniques to achieve the end goal which is secure software system and those pro hackers around the campus who usually solve hard ctf problems as a hobby gathered to take the course and one day the instructor of the course
professor cha sankocha announced to meet her midterm which was not simply testing one's knowledge and the students were asked to find the vulnerabilities in christ's easy authentication service at the same time he explicitly mentioned not to actually exploit the system at all for the midterm 24 hours were given to demonstrate the proof of concept and write a short report about it and one of the pro hacker who was taking the course managed to hacked hack into professor's account and submitted a screenshot showing oneself successfully got into the keist academic system as part of the report and if he if he uh if he managed to do that to to do so he could have manipulate the grades
and so on so the professor got embarrassed by the student yet he was fairly satisfied with the performance of the students that they found many serious security threats of the system much more than he thought and here's the analysis of such vulnerabilities so first of all it is not even a proper 2fa it's a single factor authentication because it only requires the proof of having a device and the id itself cannot be a proof for the authentication and it's uh as as you as you've shown as we have shown in the demonstration it's very vulnerable to human error one can accidentally authenticate unknown requests by mistake and it is impossible to review or retrieve the given sessions
and the other vulnerability that we found was the authentication the whole authentication procedure could be bypassed by circumventing entire authentication process and we found it being the most powerful attack and the reason why they had this vulnerability is they had to implement a new 2fa system over the deprecated authentication design and there was a misalignment between the two and this this huge and serious vulnerability was there and we can uh in later later this talk we shortly demonstrate how this attack works and not only those two we were able to find other more serious vulnerabilities for example id leakage ddos which is distributed denial of service and iodor that stands for insecure direct object reference
an id leakage is the use of the vulnerability that user id validity check api was public so that it can be abused as an oracle for valid ids by an attacker and by enumerating valid ids hackers can not arbitrarily select uh their target for target victim and that denial of service using botnet could be also be done by with ip spoofing that makes the tracking nearly impossible and also lastly idor is an access to an unauthorized resource and again this was the other vulnerability that comes with misalignment with i mean between the legacy system and the new newly implemented system so basically there were other vulnerabilities due to poorly designed architecture of apis so here's a short demo
that demonstrates the authentication bypass attack first you the attacker can enumerate enumerate user ids by going into the cast mail service where you type in some string there and it com it shows you the match of that string to certain kist members for example here we find myeongan's id email email address and the finding is that most users have the same id with the header head of the email address so you we just use the head of the email address as an id for the attack and then we try to authenticate the user in the in the browser and with proxy for example fiddler we intercept the log in packet and then replace the login packet with
the custom string that we find that we manually find out and by replacing it we can we were one would successfully log in with victims user victims authentication without any without knowing the password or having a device so here's our suggested solutions to the institute first we strongly uh mentioned that roll backing the password off would be the trivial and most definite solution and after that although it needs the further uh backup solutions but we highly we are strongly uh we were you're very clear to say that this must be done right away but they didn't but they rejected because of uh institute issue and then we tried to suggest i mean we we also said the
the original system itself is not a 2f8 and we have to go with proper 2fa that has to come with a password and device check all together and it was funny that there were there there was already another service in heist which is vpn service that already comes with 2fa and it you it requires you to enter password and one-time password via device together to get yourself authenticated and then we also suggested to have session review and retrieval functionality in the system so if you have uh you i mean you may have seen this functionality in github or google so that you can review the currently activated sessions so so that you can list up all the
current sessions that is going on and then you can selectively retrieve those sessions which seems to be uh superiors for example if you're in if you're in germany but you somehow find found that some sessions were in i mean said to be in korea you may think that it's very superior right so that you can achieve that session and then we had uh several iterations iterated meetings with kaiser administration to get uh the system fixed but they uh strongly they were they rejected the they i mean they were not they were pretty against with our suggestions because the issue was usability i mean they agree with security issue but as we go with proper 2fa
it kind of harms the entire usability so that they cannot go for it so with the suggestions i mean with the we we kept on suggesting other measures to make it secure through the several meetings with the security administrations the only change they made was to pop the number up on the screen on both the browser and the mobile app and before so that the user before granting access with fade face id or touch id in the mobile app the user would hopefully check if the number shown on the app matches with that of a browser so that the user won't mess up with i mean mistakenly authenticate other requests or attack by a certain use certain hacker
by mistake and it partially mitigates the vulnerability as it helps to confirm which access is likely to be malicious or not and some may ask if is can can this be uh can this set to be i mean understood to be fixed solution and definitely it's no so in conclusion our takeaway message is as follows first implementing easy to use 2fa without weakening the security is very tricky so as we have seen in the last slide the improved version is still not a proper 2fa and second software testing and formal verification should help in its proper design and the two methods are traditional yet effective ways to find bugs and guarantee desired software guarantee the desired software
property and here's the references for our work and thank you for listening if you have any questions please feel free to ask [Applause] so do we have any questions for our speaker today one in the back morton is going he's going testing testing seems to be working hi um on the the issue of usability versus security versus usability um there's several systems that don't require user input but you just talked about just having proof of a usb stick you put in and what's your stance on fido 2 tokens and stuff like that where you don't have to input anything but it's just there and you just have to attach it
sorry i cannot just should i repeat i'll try um what's your opinion on passwordless login procedures through hardware tokens in the security versus usability issue security token i mean password itself is not a i mean it it it is a traditional way to authenticate but it is also traditionally known to be unsafe right because password only is is very error prone to humans human human errors and using usb stick is i mean together with uh your specific you can achieve better security i guess i mean that for sure am i getting your question right or am i missing i think part of the question is so when you have a security token on a usb for example
um what role does usability play there is it easier is it um less easy does it make it better for users if they just have to use uh one of these security tokens or oh you me is security token you mean the number shown on the screen nope we're talking about um there are usb devices that have basically set it up so that using cryptography you can plug it in and basically identify as yourself and so that's commonly used as a form of authentication and so um [Music] so in that case a person only has to have this hardware authentication and then their login and password of course oh i see and how does that come with
usability oh and i think that's the good question but i think if you have usb stick you you may not you may have hard time uh using uh logging into the certain service through your mobile device where you cannot easily plug in the usb stick right so i think that kind of harms to use usability still if we go for usb stick will that answer your question yep did that answer okay we got a thumbs up very good so any other questions for our speaker i'm checking the time um i actually have a question for you and that was bad feedback so um you said that there was an entire team or an entire group of people looking at
this in a class and i'm wondering what kinds of tools did you use to investigate the implementation of multi-factor oh you mentioned fiddler yeah go ahead sorry yeah fiddler was what's the option and uh i i only used fiddler i have to ask other you know other team and basically python okay um and then my final um yep so you you're the class that you were taking though focused on um basically finding vulnerabilities in code so what kind of tooling do you do you do or do you use in that course to actually find vulnerabilities in code oh yeah we use uh ida pro gedra and gdb of course yeah i think that's all and uh
yeah and we use we write our exploit with python and using pawn i cannot recall the name of the library right now okay and then uh for the solution that you analyzed was it i didn't get at the beginning was it a homegrown solution yeah yeah right yeah okay would you ever recommend to build your own multi-factor solution or should you just buy it off the shelf uh i think it's uh i would buy a buy it off the shelf because i mean it's very trivial to to satisfy the 2fa property so that you won't get any additional benefit by having it yourself unless you have a very clear idea that can you know balance
i mean guarantee other properties such as usability here and yeah i i if the if the purpose is just for security i would buy it off the shelf but if you do it something more fun i'll definitely go all right does anyone have any other questions okay then thank you so much for your time today um it was great to hear about your story and have a great rest of your day thank you
um
hello i i didn't expect it i was i wasn't expecting the microphone to be on so now i know that it is carry on we have just a few minutes if you're here for keeping electric cars secure you're in the right place
um
i remember that you told me
so it looks like we're all set up two minutes i'm gonna start talking at 29.
probably
welcome security fans back to the second talk in the rumford room uh once again i'm jen gennesco and i'll be the emcee for the second talk i'm going to start the time now so uh first thing if you have questions at the end of the talk which we will have time for questions i would like you to run to the front of the room and talk into the microphone so that everybody can hear and also our speaker can hear okay that being said let's get started so a few years ago there was this guy his name was charlie miller he hacked a jeep he made it really really public and everybody learned about the can bus in automobiles
but that was a gas powered machine how has a can bus evolved in the world of electric cars our speaker today is going to talk about exactly this and a few other things um we have with us today shivaranjini she is has a masters from carnegie mellon and she's working as a security researcher at to say sv in singapore and she's speaking to us today live but remotely uh to give us um a talk about keeping electric vehicles secure and to the need for a can security framework so give it up for chevron genie thank you everyone so good morgan so my name is shivaranjini and i'm going to be talking about my experience uh on keeping electric vehicles secure
by proposing the need for a canned security framework so a little bit about me so i s um the nc is already mentioned i am a security engineer with this asp singapore and we have a global presence and there is also a division right there in germany we specialize in smart cabins and smart driving including self-driving capabilities and we have over 35 years in the automotive industry so before working in the csv i had a short stint at national university of singapore where i got the chance to hack into one of these famous yellow bluetooth bikes that you see in the picture right there and i was able to hack into them and was able to ride it without having to
pay for it as a part of their bug bounty program that got me super excited about vehicle security and then i transitioned from bikes to cars which is my current focus i also have a masters in information security from carnegie mellon university and for fun i play a musical instrument sometimes and every weekend you can see me paddling along the kalang river in singapore today i'm going to be starting with an introduction of what is can and the security challenges that come along with it then i'm going to be walking you through my journey of developing an intuition detection system over the years through different phases and what are the lessons i've learned along the way and finally i will be
proposing a canned security framework that addresses some of the problems that i've encountered so far so let's start with can so what is scan can or controller area network is the main communication channel inside an electric vehicle or the crux of an automotive system so it connects all the import important components that you can think of like cruise control brake system air conditioning gateway you name it so everything is connected to this can bus and these components are called as electronic control units and chan can is the channel that connects them enables the communication between them so the main benefits of using this protocol is that it's inexpensive it's durable it's very lightweight and reliable and
it's also broadcast based unlike peer-to-peer based which means that it has much less overhead and uh that's primarily one of the reasons that electric vehicles use can as one of their major communication channels so a little bit of a deep dive into a can header or a cam frame so you can see a number of fields there but there are three key fields that i would like to highlight one is the arbitration id the other one is dlc which is the data length code the data length code is basically gives you the number of bytes of payload of data that follows along with it so here we have two eight bytes data so that would be
the dlc which indicates how much of payload is sent along in that can frame and the time at which the frame was itself sent and the data itself so the key feed here is arbitration so let's start with that so in an electric car we have many electronic uh control units that want to talk to each other and they have the scan bus which is their communication channel so each um each electronic control unit has a specific set of message ids that is always on the lookout for and once it encounters that message id on the canvas it reacts to it and starts a communication with it so since it's broadcast based and many of the electronic control units we try
to do this simultaneously the can bus has a process called arbitration which gives priority to which message can be executed first so if you have a lower message id then you have higher priorities so by logic if you have zero zero zero which is the lowest you can go then your message would be given the highest priority and would be executed first on the canvas and every other message has to wait a bit more so this number is represented by the 11 bit arbitration id as referenced here so a little bit more into the different type of can messages so we have two broad classifications one is called the periodic messages so the periodic messages are message
sense periodically to get the statuses of different electronic control units at different times so i am an ecua and i want to talk to ecub and i want to know the current status at this particular point of time so i just sent messages periodically to know hey what's your current status that kind of messages is periodic so and uh the second type of one which is called as event trigger is messages that only occur when a particular event happens so you someone locks the door someone unlocks the door you apply brakes on the car these kind of events that are happening will trigger a bunch of messages that are only pertaining to that and those kind of messages are called as
eventual messages so if you see the screenshot here this is from a can database file you can see two words that's that comes up very often one is cyclic x and the other one is spontanex so anything that's called a cyclic refers to the periodic messages and anything that's called spontaneox which is much much fewer than the cyclic x messages refers to the event triggered messages and you might be wondering looking at the screenshot what is this x and why is there a can extended written at the end of it what's the significance of it so um the x refers to an extension or an extender and it's called as scan flexible data rate which allows you to send up to 65
64 bytes of data and has a longer identifier it basically allows you to send more data in a single cam frame as opposed to the original classic can so what does this mean from a security perspective so historically when can was introduced by robert wash in 1983 it was only meant to be like a very lightweight reliable mode of communication and there is no authentication or authorization that is built into the protocol itself so in order to ensure security you have to take additional mechanisms so one such mechanism is building a model or component that would do the authentication or authorization for you so um coming to that you have one such examples of of the
model is an auto source sequocy or secure onward communications so basically what this module does is as referenced by the figure it generates a lot of message authentication code and freshness value which is used to authenticate the electronic control units communicating on the canvas and send this keys plus data in the can frame along which which allows you to send more data so if you're using a classic can sending this type of information would mean that it will not leave you space for anything else and it's not your best bet but can fd can allow you to send your security related data as well as the data it's originally intended for so but there are two challenges that
come along with this one main important one is that you have to secure the keys itself they've got them and the next important challenge is that secrocy has 2.5 times more bandwidth than can itself so in certain situation which means that you cannot use this always because you would have to sacrifice your bandwidth for security and we don't want that do we so what other option do we have to ensure that there is security and the the can bus itself the communications is secure so here comes intuition detection systems so many idea solutions sit on different electronic control unit components and monitor the messages along the canvas so an example could be gateway which can
look at messages coming in from different electronic control units um so there are broadly two very different type of um ideas with pertaining to can one is flow based which only needs the bare essentials the bare essentials meaning the message id and the time at which the uh frame is sent you don't need the data you don't need anything else so this kind of information this kind of flow-based abs is useful um because it's very generic and you can use it almost in many almost in all the vehicle models that you can think of because it's very generic it's not data specific it's not vehicle specific so once you develop an algorithm you don't have to
go around changing it again and again whereas packet base is focuses completely on the data packets or payload itself so this is also useful in certain situations so these are the two major types and hybrid is a mix of both of them so when i started to develop an intuition detection system i started with a prototype so my goal was to create a working prototype from data that is publicly available so i wanted to identify common attack scenarios and generate adapt scenarios pertaining to that and along with my team we wanted to add some machine and deep learning features to this prototype and finally test the results on a pc to see how our ideas algorithm has failed
so far so upon a lot of literature survey i came upon a very highly cited academic paper called otids which proposed a novel intuition detection system from the hcrl lab in korea so they had three common attack scenarios which is denial of service fuzzy and impersonation their data was publicly available which is kind of hard to find in automotive domain the algorithm itself was fairly easy to implement and so this seemed like a good reference point to start off my uh prototype so i wanted to generate more of such data so i used vector schenoe network box it's basically a simulation tool and use i use this and programmed more attack data sets based on the reference
model that i have with me so some of the attack scenarios the first one is denial of service so as the name suggests the goal is to either jam the canvas or try to get the highest priority on the canvas so the way to do this is to inject messages that would that are the highest priority which in case would be zero zero zero which is the lowest you can go you just try to just like spam it in a very short period of time so either you get the highest priority or the canvas becomes invalid either of this uh becomes a denial of service attack the next one is fuzzy so it basically looks uh is a spoofing
attack where it injects proof random can id and data values so for example replay attacks is a common example of a fuzzy attack so uh and a few other examples could be you have the right payload and data but you send them in wrong timings so you have a time that is expected in a normal scenario but then you just send it along in a in a different timing and another one is you have the wrong data but then you send it in the exact sequences that is expected in a normal scenario so these are all examples of a uh fuzzy attack so the data format that was followed in this um uh model was a request response model where
the request was identified with a data length code of 0 and the response had a data length code of either 4 or 8 and it had the same id so we can recognize that it's a response and came along with the payload after that so the data was collected from a kia soul vehicle and for the fuzzy attack the authors had two responses and they randomized one of it as original and the other response is attack so i graphed the packets as they arrive if you can look at the figure here and so anything any packet any response that arrives after the seventh message is considered as a loss packet so we only considered that particular
window as a legitimate ones and from the figure in an attack free state uh most of the packets even if they arrived made they arrive they arrived at a pretty good uh uh time interval it didn't take too long for time to uh get the response to arrive but if you look at denial of service and fuzzing when i graphed the arrival time you can see that they significantly increased in both these scenarios so here you see zero zero zero zero zero zero which means an attack is happening and the response came much much later and in fuzzing as i mentioned there's two responses and they randomly chose this to be the legitimate one so
as usual you can just look at the graphs you can just see oh there is something happening there is an attack that's got happening here you know it does not look normal or it does not look like it's in an attack free state so that's a very good indicator so uh with the help of my team we added more machine and deep learning features to this model and tested it and it came from 85 to 94 on the pc which looked good right the numbers look good but then i realized one thing they didn't mean anything when i requested data from the original equipment manufacturers i realized that most candida they do not follow the message request format at all
and instead follow the sequencing format so which means what i had implemented was a corner case scenario and not the norm so one of the biggest takeaways that i got was to never to trust the research scope blindly even if it was highly uh uh silent research so it was back to the drawing board so this time i wanted to create a proof of concept and i was very careful to model the attack data based on real industry data and instead of testing it on pc i tested it on ip03 which is to say it's high-powered ecu so this ecu is mainly used for autonomous vehicle capabilities so the attack scenarios were uh similar to the prototype ones
denial of service and for fussing i also included the categories of can messages account so for fuzzing i send packets that model the original one but it's injected in a timing that is different from what is expected in a normal scenario and it applies to both the periodic and eventual messages but for frequency it's essentially the same as buzzing but it only applies to periodic messages and not even triggered messages i also added an additional attack scenario which is any id which is not recognized by the original equipment manufacturer is also considered as an attack packet so i labeled that as alien id so this was the model for the clip of concept so i would have the normal data
set and there would be a pre-processing algorithm that would uh understand the um how it behaves in a normal scenario so once they knew that uh we generated attack datasets using the canoeing vector box and fed it into the ip this is the ip03 so there were two detectors which are part of the intrusion detection system the first one is a logistic detector or a statistic based algorithm which takes really less resources and weeds out common attack scenarios like denial of service and alien id the next one was a learning based detector which used a deep learning based algorithm to filter out the fuzzy and frequency eventually it would just give out if it's a zero which is the
normal or one which is an attack so um the results looked something like this for these statistic based ones so for the alien id it's anywhere from 898 to 100 which is really really good and for the denial of service it was either 82 to 83 which seems really good and it's very comparable to the industry standard so if you combine both of them the accuracy goes a little bit up to close it to 91 so this seems pretty good and in line with what i had wanted for the uh deep learning algorithms it ranged from 85 to 98 percent and this screenshot here is from ip03 once the deep learning algorithm was executed so it just says that the
accuracy is 100 here for that particular data set and it just gives you some statistic of the messages that were being processed so what were the challenges that i encountered during this this phase of um concept so the first one would be to fit the statistic-based algorithm into the mcu since it takes up very less resources and to redux the common attacks like dos and alien but then i found out that there was no straightforward way to get the idea and time stamp in iq03 because of the way that it was designed and i needed to like make some um changes in on a deeper level in order to be able to do that so um
and second challenge which you might all have guessed is that we need to make sure that the incision detection system is not the sole focus of the asu itself so it should not take up a lot of the resources be it if you put it on system ship or if you put it on mcu it has to take much it has to perform uh very well and take up very little resources as possible so uh these the proof of concept was a fairly good success but that it that's brought me to a very important question and the question was i used data sets which are simulated and the ids algorithms did really well on them so it's great
but that does not equal to a real car so if you actually deploy this ids algorithms in a real car would it still work the same way as it is intended to because in a real car you can have like some latency some delay some legitimate packets can come later than they're expected to and what happens to the ideas algorithms then would it would it still perform uh as it's supposed to or would it give you a lot of false positives and false negatives which we want to avoid so in order to answer the question i proposed the scan security framework so i wanted to test i wanted to set up a can test build which can be used to run
a canvas in a realistic environment and then develop a pentas tool which can inject uh messages into the cam so this is basically a this would be like an attack toolbox that injects messages into the canned test bed in a live environment so one of the main outputs that you can get from here is a very realistic attack data set that can be fed into the ibs algorithms you can see how the algorithm is fair you can make modifications if it's not up to expectations and then you can again feed it over here so it's like a continuous loop uh for improvement of security so uh an example which is a very uh example of a can test but sera is i have
a wired can harness setup with a lot of nodes and then you know some of which is uh uses arduino and simulates different uh components in a car like a door engine hvac and then there are some nodes which have the much advanced pc use from just a like ip03 so and some nodes which are left open so we can add more issues later on we may want to continue testing and this this connects to a usb cam interface and and then connects to a pc which runs the software message induction toolbox or the pentas toolbox which um performs the attacks needed for intuition detecting systems so uh you can also have other related components for the ecu's to work like
sensors or actuators and other platform environments to make this run smoothly and uh so the pen testing box itself would uh start off with emulating denial of service attacks uh fuzzy replay or management attacks then frequency attacks by manipulating chaoticity and to be primarily developed in python so uh with this i'm going to go to the end of my presentation so here are some of the key takeaways that i want to share with you one is that if you have a highly cited research it doesn't always mean that it's useful in the industry so always model a system based on real data and my opinion is that hybrid idf is the best bet moving forward
uh this uh flowchart here represents a model for my proof of concept which took up very less resources so i had a statistic based algorithm which weeds out most of the attacks and uh deep learning one which uh we did on the advanced ones so if you want if uh companies want to do something more specific for payload it can be just added to this model over here without compromising on the resources which means that it would be a really good solution in my opinion and finally i do not consider my prototype to be a total failure because i got a chance to be here and tell you what not to do so with that i would be ending my
talk and i'm open to questions now from the audience [Applause] hello all right folks so it is question time does anyone have any questions for oh we have a questionnaire come up please i have a question related to your ids model uh and related to your oem data that you received so isn't it like that you probably have to train a model for each different type of car model because the data will always look different maybe even inside the same oem model line uh okay yes so uh basically if you use the flow based ids which is what i do the ids would be different for each uh each vehicle model because each uh oem
have different ids and different way of processing them but the logic itself would be same but you have to as you're right you have to retrain it for every a single model uh depending upon their their own um ids and specifications that the core algorithms would remain the same if you use of the waste model okay thanks and question two if you don't mind um in the very beginning you were talking about using message authentication codes for protecting con bus messages how do you actually distribute the symmetric key for that how do you what how do you distribute the key that is used for the hmac for the microphone this key is distributed along the um
the extended cam frame so it is sent along with the frame so when you do the checking you would actually check if it's it works or not on the receiver's end so it's sent along it's distributed along the cam frame itself so how do you secure those messages with the key so um let me just go back to this slide [Music]
okay so you have a sender and receiver side so you basically have a you generate a mac and then you have on a receiver side you would uh verify the key that is being sent so i think like you would send this on a can fd and you would like uh based on the uh agreement before you would just verify and if it's if it's not okay then you know that someone has tampered with a message and if it's okay then uh you would proceed with the uh the other other functionalities right but where do we get a secret key form oh okay so uh i think this is uh i'm not really sure but i think this is
randomly generated this is something i do not completely know yet okay thank you all right are there any other questions today so then um with that we're going to close this session thank you so much for being here shiva ranjani and um have a great day thank you [Applause] so folks i think is there a break right now morton there's a break so head out stretch your legs get some air and come back for some great talks
foreign
hello
so we start exactly on time hello everyone nice to meet you all here in the ramford room or in the moonshot state my name is oliver nettinger i am seeing the next block and i'm very happy to introduce to you today zara mada she's having her talk about bluetooth low energy devices and how to i would say hector and yeah i think it's a super interesting topic because probably everyone in the room has at least one device that supports bluetooth low energy and as i've heard and got to know it's a it's not a very simple protocol and the specifications seem to be pretty insane so i'm very much looking forward to learn about bluetooth slow energy hacking
today and please spend a warm applause for zaga and enjoy the talk
okay hello uh hello everybody and welcome to my talk uh hacking like exploit development for bluetooth low energy let me first introduce myself my name is sara mada i'm a security analyst at inside attack logic whereby my main fields are iot and hardware hacking web at network planner penetration test as well as the bluetooth technology a few words to my company inside attack logic was founded 2014 in munich where we also have our headquarters we are doing mainly offensive security so red and purple teaming penetration test as well as security consulting so bluetooth low energy what is this do we need this is this important um if you have a look at this diagram which
is provided by the bluetooth stick the special interest group in the yearly market update this bluetooth special interest group is responsible for publishing and maintaining the bluetooth specification and if we look at this diagram then we can see that the estimate a total of nearly of over 5 billion bluetooth low energy devices in 2022 and if you compare this to the world population which is uh over 7.9 billion people then uh probably it's quite an interesting topic and most of you uh at least have a smartphone which uh i assume to be bluetooth uh capable so what about security
we can divide the vulnerabilities in bluetooth and for categories there are general flaws which relate to every wireless protocol as also wi-fi for example so everyone can send packets over the air so jamming a connection and denial of service attacks are very easy to realize the next big problem is our configuration issue so simplified set if you do not use any encryption then it's probably not secure a big problem from uh for this protocol itself our design issues um in the beginning of the protocol you might already have heard about uh the talk from mike ryan with low energy comes low security so the earlier version did not provide any security in the pairing at all
it's still in the newer versions which tried to uh we will come to this uh soon still in the newer versions um many vulnerabilities were discovered and published and i think the last one was at the end of the of last year which i known about and finally we have implementation issues and implementation issues can uh lead to very serious vulnerabilities of there's one very impressive example the bleeding bit vulnerability which was discovered by the armistice army's research lab and hereby the researchers detected and heaped based buffer overflow and the handling of advertising package which lead to code execution and finally could lead to network compromise of the connected wi-fi network i added a sources for reference
so getting started if you start with bluetooth energy you might you have at first the bluetooth specification which is a very large document the bluetooth specification the newest version 5.3 is over 3000 pages large and is quite overwhelming if you start nevertheless we have not the time to dive in that deep today but i will give a short overview of the of the techniques we need for the for the later exploit after this we will discuss a tech scenario have an overview of the mirage framework which i prefer to use for my exploit development have here some implementation details and finally we have our demonstration and the conclusion if you start with bluetooth you have at
first to know that bluetooth has its own protocol stack the protocol stack is hereby divided into hosted controller the host layers are marquee and blue are regularly implemented in the host operation system or your smartphone and the controller layers are separated in hardware so if you think for example the easiest example here is if you have a bluetooth usb dongle then the control controller layers will be in this will be implemented in this normal and the host layers are implemented in the host operation system the host controller interface is then implemented or realized via usb might also other interfaces are possible the most important layers for us here are the probably the security manager the
security manager is responsible for the security bluetooth mainly for establishing a long-term key and further distributing encryption keys or for example keys for address resolution and the second layer which is involved in security is the link layer which uses this previously established key for the final encryption so we can summarize the connection set up as follows we have always two or the most cases we have two two parties in the protocol the master slave device the master device is uh usually your smartphone or your computer and the slave device are peripheral devices as for example keyboards or smart light bulbs or any other smart iot product um this slave devices send advertisements to make itself
discoverable for the master device and the master device will then scan for this advertisement and if it's if this is the correct slave device to connect and the connection requests are sent and the connection on the physical layers are established and after this we have a service discovery so every slave device offers services for the host for the master device to use and if one of both devices decides that it requires a higher security then the pairing process can be started so pairing a bluetooth um uh has depends at first on the version of the protocol or of the of the hardware which is which version is the hardware is capable of and the other factor are input and
output capabilities input output capabilities are for example displays or keyboards the earlier versions were the legacy pairing so 4.0 and 4.1 uh here we have just works pesky entry and out of bed pairing out of band pairing we won't discuss it today because out of band is everything but it's not using bluetooth at a protocol just works pairing as the name says just works so the user has no further here's no further interaction from the user required and for past key entry uh at least one device with a keyboard and one device with a display is necessary so here the display device will present a pin code you i assume you already know this procedure
if you paired a bluetooth device earlier and on the keyboard device you have to insert this pin to finish the pairing process this legacy pairing is very broken and i already mentioned the talk with low energy comes low security and should no longer be used uh with version 4.2 the cq connection feature was introduced a secure connections are based on an elliptic curve diffie-hellman key exchange and also offer that just works and pass key entry pairing as well as out of band and additionally here we have the numeric comparison pairing autonomic comparison means that on we have two devices with our display and the both devices the same pin will be presented and if the
user compares the spins it verifies that the pins are identical then the pairing is complete here we have to set the have to say that just works pairing is an unauthenticated pairing so that just works parent has no protection for many middle attacks okay so reversals are very good in bluetooth hacking so we have to do something other funny and what is widely accepted are these bluetooth keyboards and i have to state that this is not manufacturer related it's just an example this is um this flaw relates to the to the specific service specification so in this case the human interface device service and what we can do here we can man in the middle such a bluetooth
keyboard and turn the legitimate keyboard into an rubber ducky device if we do not know what a rubber ducky is a rubber duck is a malicious usb stick if once plugged in and the victim's computer is able to lock or inject keystrokes to execute commands so for our example we will refer to the rubber ducky language the daggy script and try to exploit the victim computer with a very simple example which just pops up the calculator so to achieve this man-in-the-middle situation we have multiple options the first options is already stated we try to use the just works pairing which is unauthenticated and many the middle setup is very easy to establish for this the attacker requires two usb
to bluetooth dongles one for the connections with with the slave device and one for the connection with the master device and the attacker will present to both directions that has neither in nor output capabilities and as a result the just works pairing will be used if the if the devices accept this the second option is if you say okay i know downgrade text i would never connect my keyword if i have if i do not have to insert my pin code then we have a second option with the human interface device service and this scenario we first pair with the slave device with the just works pairing the keyboard often does not have the requirement to
do an authenticated pairing process and then we pair with the bus we've spoofed the device wait for the master to connect and allow the passkey entry pairing because we say okay we have a keyboard we have input capabilities and as soon as the victim inserts the pin on the master presented pin to the keyboard it will be sent to our attacker device because as regular keystrokes and therefore we can use these keystrokes for the pesky entry pairing so what do we need for this attack we need two usb dongles i prefer to use the nrf 58420 dongles they cost around about 10 euro and are supported by many frameworks and are therefore very flexible i
use them with the cpu project here we have an host controller interface usb sample which is at the end and bluetooth usb dongle then if you want to say it like this for the host layers i like to use the mirage framework the mirage framework it's very complex but also very flexible a devil fits does fit the needs for the bluetooth protocol which is also very complex so as the auto states the mirage framework is a powerful and modular framework dedicated to the security analysis of wireless communications the python framework is supports multiple protocols for example bluetooth low energy wi-fi sigp or infrared and we have a closer look here the mirage framework has several components there are at one side
the core component components which are necessary for configuration loading and execution of modules we have the internal libraries whereby the bluetooth low energy stack is implemented for example and we have modules and scenarios we will focus on the modules and scenarios today modules offer the possibility to implement a specific attack for example the man in the middle attack and can be used by multiple scenarios to to adjust this attack to our needs so we have one module and multiple scenarios and can easily modify this man in the middle attack to our needs to do so we have to define scenario signals in the module that means every scenario signals stands for a packet that the module might
receive in this example the packet is called say hello and if the scenario want to overwrite the send out this method or this packet then it would have would have to overwrite the hello world method additionally we can control if the module code is executed so if the scenario signal signal method returns true then the module code is also executed and if the scenario signal returns false then the model code will be skipped to create a module in mirage we can use the command option create module this will create a following template here we have two methods which are very self-explanatory i think we have in it which initializes the module here the most important thing to note are the
arguments so we can pass some command line arguments to our execution uh most important for example the target device address and host controller interfaces which are used by the module and we have the run method which is used for the final module code so since the model for the specific examples with the secure connections man in the middle model is very large and also complex and i also had to implement the cryptography for the secure connection in mirage we cannot cover this in detail here but i will give an overview what the model does so at first the attacker will initialize both bluetooth devices to bluetooth interfaces and scan for the first for the provided slave device as
for the bluetooth device address and tries to connect earlier than our original master device after this happens the most devices are blocked because slave devices usually are only capable of holding one connection at one time and a legitimate master is no longer able to detect the slave device after this we clone the slave device wait for the master to connect and pair with both sides depending on the scenario and requirements after this the middle setup is complete and the module is only forwarding package from master to slave and from slave to master so we will have now a deeper look into the scenario the scenario can be created nearly the same way here we have three methods on
start on end on key on start and on and are very explanatory again on starts is executed when the model starts and on end is executed when the module ends and on key is executed if any key on the attacker keyboard is pressed so um since we want to pass our human interface device key keys to uh to lock to lock the final user input to console and also we have to pass our ducky script here we will do this in our unstart method the mirage framework provides a ducky script parser function which is very comfortable so we can provide our file here and have for callbacks which i will explain on the next slide
uh on and can hear by be ignored very trivial and on key just days if we press the escape key then we end this module and if we press one we want to inject our ducky script to the to the establishment in the middle connection the best way to explain this method is if we compare this to our ducky script so at first we have to add keystrokes method which is uh most important so here we have um [Music] so for human interface interface devices the it works as follows the so if you press the key on your keyboard then uh the sticker will be sent for the key and if you release the key that the keystroke
release message will be sent and what is most important here is the timing so if you are too fast or if you are sending our keys or messages too fast then some of our keys will be skipped and if we are too slow then the user can see every single letter appearing on the on the screen which is a bit obvious um the in the packet list just is the array add delay adds the delay or for the delay parameter of our ducky script and add text is for the text for the string parameter of our ducky script so one last thing to do since we want to lock our keystrokes we have to overwrite the enslave handle
value notification packet in bluetooth these keystrokes are sent via notifications from the keyboard to the computer and here we check for the handle hex 13 and if it's the right packet then we lock our keystroke to the command line so one last thing uh the used keyboard for this example requires requires the user to put to push the button on the back side to put the device in advertising mode so the device is regular regularly not in advertising mode but you have to push the button if you want to connect this device i will get to this point after the demonstration so i hope to stop this here okay so what we can see here is on the left
side is the victim windows operation system on the right bottom we have the bluetooth keyboard and in the right top we have the attacker terminal whereby the mirage framework is called with the bluetooth secure connections man in the middle module we are providing the bluetooth device address the bluetooth device address can easily be sniffed for example with the uber tools or also with lunix utilities as for example bluetooth ctl this device is a funny fact uses a random address and the randomness of this device is that the fourth bias of the address is regularly increased by one bit so okay then we have our interfaces uh house controller interface zero zero and one uh we do not
require master spoofing in this case so the keyboard does not care about who is connecting but we need slave spoofing so that the computer holds the keyboard for a regular bluetooth for the regular bluetooth device provide our scenario so blue bruce lowe energy human interface device man in the middle and finally we provide our ducky script okay
okay that's um good um at first we are entering the scan stage so okay this is very bad sorry um maybe i can start one second sorry
so maybe this is better hopefully yes okay so we're editing the scan stage now the moment is here to push the button on the background of the keyboard after this the keyboard will be detected connected and cloned and now we are waiting for the master device to connect
so master is connected man in the middle stage is established in the background here is now the service discovery that takes a take some seconds and now the bluetooth keyboard is connected we can check now that our keylogger is working
and this keylogger is also working if the windows screen is locked so we can also capture the the login password of the user
and finally we can inject our ducky script by pressing the one button on the attacker keyboard and opening the calculator [Applause]
okay so a few thoughts here how how easy is this um you could say okay if this uh this attack is only possible if the devices are not paired well there are many ways to disturb the connection and enforce a new pairing process so the easiest way i could think of is i just jam the connection and as soon as the user is annoyed enough to repair his device i'm able to intercept this connection the next question is what you can ask yourself would you push the button if your keyboard is no longer working and repair the device i think i probably would and to prevent this attack you have to make sure that your keyboard provides a
security mode one level four this is this means that the secure connection mode is enforced so that the both devices enforce that an authenticated pairing algorithm is used so not just works pairing and then this attack is no longer possible okay that's it so questions cool thank you very much zara round of applause for him
i think it was so interesting we have some time for question if you have a question please run to the front to the microphone and um
well the bluetooth class specification also has an authenticated pairing so it's also called just work so the specifications are not identical but the pairing algorithms are comparable and if you have if your keyboard allows this authenticated pairing then you also have the problem
you mentioned that specification has 3000 plus pages and then if you look it also has a ton of specifications for profiles and services and you have different types of operation mesh and you have sort of regular advertising and then the authenticated communication how do you start with bluetooth low energy if you want to do this and do cool things like this uh it's a good question um i think if you start with bluetooth then there are uh some easy tools as for example uh gateker or beetlejack which you can start with and i think most important is that you um get used to the protocol so uh analyze the regula the the communication of devices so very easy to analyze the
communication for bluetooth protocols via the host controller interface this is um possible on linux devices via the bluetooth one monitor you can also extract the host control interface new block on android devices so this is a developer feature and the benefit in this case is that since the encryption is established in a link layer and you capture the devices in the on the host layers then they are not encrypted and you can check what the protocol is doing because if you are sniffing the protocol over the air which is on the one side very complex and not not working very well um then you have encrypted traffic if they're using encryption and then you cannot determine what is happening there
so the best way is to i think is to start to get a device check check the get the host controller interface logs put them to wireshark and analyze the connection setup and try to find the required parts in the specification it's not easy but once you get used to the specification it also makes kind of sense cool thank you very much so we don't really have time for another question but feel free to reach out to sarah in the break i think she will be happy to answer any more questions thanks again a lot for the talks adam [Applause]
all right hello everyone again welcome to the next talk my name is oliver netting and i'm introducing to you um ido cohen and arnold osipov yes um and they're talking about today um how they went and found a suspicious lock entry and in the end brought them to a very interesting and sophisticated crypter and i'm very much looking forward to hearing your talk and yeah give a round of applause for you guys and
thanks oliver hi guys thanks for attending our session from simpalog to sophisticated crypto my name is arnold as already introduced i'm here with my friend and colleague hido and we both live in israel and work as malware and threat researchers at the company named morphysek and today we want to share with you our story on discovering the babadeva crypto and we'll do so by showing you our point of view when we investigated something that might have been simple or irrelevant anomaly in one of our customers logs and later on we'll dive into the cryptos internals and understand how it works and how we use that information in order to uncover a major campaign that targets
some nft communities including some additional campaigns that targets the ukrainian entities but before we start let me say a couple of words on what the crypter is for those of you may not familiar with it um let's do that with an example let's say we have a thread actor that wants to use some open source information still for some of his criminal activities if you'll use it as is theoretically security solutions would be easily detecting it either by strings metadata or even some of the code blocks and that's exactly when cryptos comes in handy they solve this problem by taking this malware or the information stiller as an input and performing some obfuscation encryption and code manipulation
and hopefully the output of decrypter is an undetected file without further ado let our journey begin so one of our tasks is security researchers is to keep track on current trends and understand the threat landscape and its evolution and in order to do so we use classifications to divide the massive amount of data we got from our customers into specific threats or behaviors and that helps us to cluster the data and follow trends more easily so one day while going through one of our customers logs and we came across an anomaly specifically it was a bot which is an information stiller and at the time we had plenty of those infections but the thing that caught our
eye was the fact that the infection chain that led to it was something that we haven't seen before and the the clipboard was being executed from an executable that was being signed from by so by a microsoft which was legitimate and at this time we were taking those first aid ashes like every person in our field would do to violence total yeah and we were surprised to see a zero detection rate which means that no security solution was able to identify this file as a malicious and we were even more surprised to see that this was the case for multiple scans in a very long period of time we took an additional ash from one of
our prevention logs and the result was the same so at this point we were extremely curious to understand how the crypto works and what keeps that low detection rate so in order to do so we need to get our ants dirty and dive into the cryptos internals so this is an example of how the infection chain looks and you can see that it's composed from several components and we'll explain each and every one of them let's start with the installer first stager and the installer starts with the decompressing its files into a newly allocated folder on the disk and this is how this folder looks and you can see that among it has very files in that folder
and among them most of them actually sorry are in fact legitimate they belong to legitimate application or some open source application and the ones that marked in red are belongs to the malicious infection chain once the clip the installer finishes its execution it calls the file marked as one the executable and this executable sole purpose is to invoke the export function of the next stage malicious dll and as you can see it does so with just few lines of code that are placed at the very end of a log function whereas the rest of the code in that function is actually benign again belongs to legitimate application once this dll is being invoked it starts its execution
by calling some sorry some additional dlls and this part in the crypto is written in a modular way that helps the cryptos users to choose which functionality they want to use for example the functionalities that we've seen are for example a dll that performs persistency in some various different ways and a dll that can download the additional malware and many more so once the dll finish calling those modules it will load a file from that folder to memory in this example it's an xml file but we've seen many other file formats such as pdf png and many more and this file is structured as the follows in between its legitimate xml strings we have the first shell code
later on we have a configuration structure and after that we have several of um encrypted chunks and once the main dll reads and loads that file it will invoke the first shell code by holding a hard-coded offset to it this shell code has several functionalities one of them is to load the configuration structure that was placed in that xml and this configuration structure holds the decryption key and the ship decryption key for those encrypted chunks and their offset and size in the on the on that file and after sorry those configurations those encrypted chunks are actually the final payload and the second stage shellcode and after loading the configuration structure it will decrypt those chunks and
concatenating them to the original form then the decryption shell code will invoke the second stage shell code which is the injection shell code and this injection shell code well is responsible for injecting the payload to the original process memory and after so well that's the high level overview on the infection chain but for those of you who want in-depth uh information on that crypto can check our blog for more information so why did we name it babadela the following is a code snippet from one of the shell codes specifically the second one the injection shell code and if you take a closer look you will see that there is there are two d words in there
the one in green dead beef with a typo which is known to be used as a magic debug value and the second one in the red babadeda is in russian stands for grandma grandpa so it was an easy choice as there were no any other indicators to choose from moving on to what makes it so evasive well if you remember we have the code logic that is splitted to several dlls and those dlls are placed among many other legitimate files and the code in that in those dlls placed again among many legitimate code and on top of that in your variants we've seen that the main dll is being loaded by a newer technique that called the dll side
loading which helps reducing the amount of malicious files on that folder by one and results in executing the final payload from a legitimate forces as if you remember we've seen in our prevention log earlier and in addition we have the final payload and the second stage shellcode encrypted in some file format for example the xml and they are encrypted as i said and split it so at this point we have the um couple of logs from there we extracted some hashes and understand how the crypto works and i'll let it all continue from here thank you arnold so after understanding uh how the crypto works we had to figure out what our next steps i like to look at analysis and research
as two different things in by in my point of view analysis is focused on a specific piece of malware or a specific task such as incident response on the other hand research is a broad field which is redirected as new information revealed that's actually exactly what happened in our research we started by looking at the baba data crypto itself and now we are moving on to find its usages in the wild at this point we have a pretty good understanding about the inner workings of the crypter we can use this knowledge to translate and translate it into a yara rule for those of you who don't know what a rule is all you need to know for now is that
java provides you a convenient and intelligent way to search for byte strings for bytes and for strings and byte sequences in binaries so our yara is composed of three main indicators the first two are being the baba dada and dead beef placeholders are not just talked about and the third one is are few bytes from the decryption shell code which i remind you is not encrypted on disk once we have our java rule tested and ready we can use it to hunt for more samples we hunt in our own data on our own telemetry from our own customers as well as online malware databases such as virustotal and malware bazaar those databases usually let you use your own url in order to
find format samples so after running this sample collection we had a pretty large database of bava data samples that we needed to extract some information from them so we wrote an automation that extracted some iocs which i'll talk more in a bit about them in a bit and then we used open source intelligence to find some references of usage in the wild so let's go back to the automation and understand what data we need to extract from each sample to do that we need to understand what is our goal and our goal was to find usage in the world so we figured out that the summer source network activity and the final payload classification will
be enough for us and as you can see there is an example this is an example of our automation's output the left column is the final payload and the right column is the corresponding network activity network indicator which can be your full url domain or ip or even blank if we haven't succeeded and we used this output to see what manual families made the most use of the baba data crypto so this one was pretty interesting to see then after collecting and starting all of the data and we started open source intelligence we used online sandboxes malware databases social networks and of course google and virustotal to see any correlation between what we found so
far and that's when we came across the following tweet the match with one one of our samples hushes the interesting part in this tweet is that it provided us an additional information to work with as you can see there is some kind of app called mumbin with a what's supposed to be its domain but actually it's slightly different from the original one and a discord message so it's small in here the discord message that classified as a scam alert we wanted to see if there are any other services that follow the same pattern so we took the ip address of the fake domain and we saw that there are indeed uh other services lister listed lister
listed under it the most famous one you might be familiar with is a fake domain of lava labs but and then we got into inside those fake websites website looked inside the code and we were able to find some watermarking that allowed us to write a crawler which will find additional fixed website like those and after running our automation we found a bunch of them uh with a one specific common ground that they are all some kind related to the nft and crypto communities the last piece of information we wanted to verify is whether the discord message is consistent among all of those services so we entered the discord server had official discord server of which one of
them and we found that indeed they are all having the same discord scam message this point we understood that there is some kind of an ongoing campaign that targets the nft and crypto communities and we decided to dig deeper and that's what we found for those of you who don't know the nft and crypto communities are highly dependent on discord as the main communication channel the attacker knew that fact and use it used it in order to find new victims once they targeted a service it will add its bots into the discord server of this service and those both bots will send direct message to each member of the channel as you can see the the
discord message has an official looking of a username and title the discord the message body calls the victim to download the new desktop app in order to gain access to the latest features and at the end it provides a link to the official website which is different from the official one for example in this case debunk the official website is debunked.com where the provided one is debunk.found once the victim fills into the scam message and clicks the link it will be displayed with a website that looks almost identical with the one main difference that the login button is now a download button yeah um if the victims continues and download this file it's actually a malware that
upon execution provides the attacker a full control on the infected system we decided to publish a second blog post and that details about this threat actor and the attack chain which you can go and read about wait we actually tracked back the activity of this attacker and we found that it was active since november 2020 and and the more interesting part is that since august 2021 its preferred crypto is no other than the baba data crypto we just talked about so this actually explains how we got to this campaign at the first place we wanted to see some usages in the wild and we got them at this point you may ask yourself why do we publish our
work why do security companies and security researchers maintain an active blog there are actually many answers to this question but for us is to hopefully make an impact i think the this figure shows how many new targeted services were each month by this attack chain and threat actor and as you can see before uh releasing our first block above the baba there were around 15 new targeted services and our first block only briefly discussed about this ongoing campaign so the impact was of course close to none and then we decided to publish a second blog post focusing on the threat octo its evolution over time its infrastructure and much more and finally we got some impact and as of today we
can say that there are no new targeted services and we can say for sure we caused it but we for our own motivation we do like to think so okay another reason for us to publish our work is uh to help other with their research for example let's take the case of baba data against ukraine the third ukraine the ukraine third has published that ukrainian state organization has been targeted to saint both and outside malware attacks and this blog post there provided some details not many and with some iocs but didn't explain so much about the infection chain then security company named chelsea took the award work and expanded it and what they found is that baba data was used to
drop the house still malwa you can go and read more about this i encouraging you a very good stuff so let's take a recap of what we have talked about today and what are the main key takeaways we want you to take from this talk and we'll begin by understanding the threat landscape you're working with if you recall that's how the all resource story uh began at the first place we knew how certain malware infection chain would look like and that helped us to spot the anomaly between the rest of the logs look at the bigger picture although we started by inspecting the baba data crypto and its inner walking we then immediately took a step back
to understand what role it plays in a full infection chain in a real life scenario use automation to make your analysis easier when you are dealing with a larger amount of data you must automate as many tasks as possible to make your analysis and research much easier and quicker for example in our case we have automated the sample collection the ioc extraction the corolla and many more components and finally share your findings and make use of other findings this is a great community contribute to it it will help others in their work and hopefully will help you in your future work thank you everybody [Applause] cool great guys thank you very much for the interesting talk it was nice to be
taking on on your journey of discovery and also i mean linking to the keynote from this morning sharing your findings um sharing what you figured out it's a it's a really cool thing to yeah to help the community and maybe some of you got interested in also doing similar stuff so yeah we have time for some questions so when you have questions feel free come to the microphone
um so between the two blog posts uh there seem to be around four months more or less how much human efforts there is behind this kind of research and how many people were involved and was this let's say a focused effort on that or how was this done let's say so um actually after posting as i already said uh at the first ball post we briefly discussed about the campaign so we already knew that there is some kind of an ongoing campaign but we didn't go went in much details and understand it fully but we keep our eyes on it and then after after a certain period of time we decided to go back together
we worked together not only because it was uh uh it had is a lot of work to be done but also because it's fun to work together and and we got back and we decided to investigate more this threat acto and it's it was i think at the same scale of the first research but it's different because the first research is more about reverse engineering and malware analysis where the second is more threat intelligent and and those sort of stuff but yeah thank you thank you you had this slide with the bullet points what um about what made the scripture so evasive um i'm not i'm new to modern analysis but still none of those bullets was really
new to me so was is there any really specific thing that made it so evasive or is it is it actually not that hard to make a crypter with a zero detection rate so is it common is it a common thing or is it really that special so first of all it's not that hard to make a crypto so evasive but the the thing that was special here that was the fact that the script was evasive for a very long time that was the what's interesting in this part and by and this and it does that by com combining all of those bullets as you said together that's how we it does that
very nice then um yeah thanks again you guys for also making a trip from tel aviv here to munich and to present and thanks a lot another round of applause [Applause]
okay yeah there's still people coming in but um i think looking at the time you should get started so please take a seat um and yeah it seems the other talk took a bit longer so all right then let's get started so welcome again to the third talk in this blog before the lunch break my name is oliver i'm the emcee for the session and i want to introduce today to you christian bauer he's talking about building a security program for sas product development and um i think that this is how to get security right in a new product development so he will um hopefully show you how to do that so thank you very
much for coming by sharing your talk question and give an applause and have a nice thought well thank you for an introduction so as the name implies here this is basically going to be about how to secure cloud-based product development but before we start maybe a few words about myself so i've been working in software industry for a few years now i started out as a software engineer originally but then transitioned over into a security role and i'm currently working with a security engineer securing an as a service platform which is currently running on two cloud-based platforms and this talk is a kind of a summary of the things i've learned over the years both in my current job but also in my
previous job i've seen quite a lot of different cloud-based products okay so imagine you're a security engineer and you're starting a new job right today you're working for the startup called acme and you've hired specifically to build up their cloud security program that means you're the first person who really thinks about security from a holistic perspective and hopefully or knows what you're supposed or what they are supposed to be doing and when we are talking about product well their product is what you see very often these days on the one hand it's an asset service offering can be platform necessary it's a software as a service it's running on those usually well-known cloud provider platforms
and their technology stack is so something you see very often these days so the applications are packaged into containers the containers are being deployed and executed by kubernetes and you have some additional cloud service around it that you need for compute instances for storage services and so on so this is another thing do let's say unusual that you can see in cloud space these days so the question is where do you start right security is a huge topic and you can't do everything on day one you have to somehow prioritize and think about what can you do the beginning and what can you do later that means at the beginning we want to go for some easy wins right things which
are so-called low-hanging fruits but at the same time have a high security impact in terms of improving the security posture of the products so how can we do that and that's basically what this talk is about how can we define a security roadmap that somehow starts in a relatively easy way and then scales up over time so security activities that i'm going to present on the next slide are grouped into different phases phrase one is like the absolute basics that really everyone should start with and then starting from phase two we are going to have more topics that might only pay off in the medium term whether something should be in phase two or three or four is a little bit let's
say it's always biased it depends on what's your background right for people here who are have a web application security background they might complain hey there should be more web application security in there i've done a lot of work on the infrastructure space so obviously it's maybe a little bit more biased towards infrastructure it also depends what's the current security posture of a company and so on so long story short about phase one should really apply to everyone afterwards it depends what's the best approach for you but it's more likely to give you some inspiration all right so what are the security phases i have to find four phases and if you do all of
those you already have achieved a quite good security back posture doesn't mean you are completed but you have i think some good achievements phrase one at the very top let's say the basics i'm going to talk about those in detail phase two these are already some additional topics i will briefly cover them and unfortunately regarding phase three and four i don't have enough time to dive to talk about that but i have backup slides if anyone should be interested the idea is first one i said the basics phase two is some topics that are again easy wins but at the same time should payoff on the medium term starting from phase three we have some topics that
really either need a lot of effort or are only going to pay off in the long term especially for phase four so i say that don't have much time so let's get started phase one and this is actually the very first task that you have to do at the beginning remember you are coming in this is your first day and you have no idea what you're dealing with you don't know how to secure something if you don't know what you have the first idea is you have to get an overview of a cloud environment which cloud providers are using how many accounts do you have do you have a segregation within production non-production environments and most importantly you need access to
those accounts and start looking into them what kind of infrastructure is deployed in those accounts what kind of services are being used and most importantly how are those services configured from a security perspective we'll talk about that a little bit later if you're only dealing with a handful of accounts so in the airbusiness course it accounts on google cloud is called project but it's basically always a cloud environment you will easily run in this callability problem if you have to deal with those and so maybe even hundreds of accounts you can go over them manually right can you click login into every individual account start clicking right on the dashboard that's not going to scale
so you need some form of automation and there's a lot of tools out there that can really help you with that and these are all open source tools that i've listed here and that list is not even complete which tool is the best one for you depends on what you prefer working with and also how many cloud platforms you have to cover some are multi-cloud some are only supporting one individual cloud firm in particular aws because it's the most widely used one it also depends how you want to handle the tool and what kind of output are you fine with reading json output or do you need some double outputs do you want to process this further
either way i personally have used steamweb a lot because it's super flexible and supports a lot of different platforms including software service platforms like cloudflare or github but it is you can use that to automatically obtain data from your environments and specifically kind of select what kind of information you are looking for what kind of misconfigurations maybe that gives you a basic idea on how serious security was taking the beginning right if you already see some bad things there you know oh this is not a good start otherwise if it's looking roughly good you know okay i don't have to start from the beginning either way so i've used this tool to collect a lot of information and then
have to start looking for this information and the very first thing that i usually want to do there that's step two now parameter protection and in cloud we have two parameters the first parameter is identity and access management and i think this is a big differentiator of cloud compared to on-premise systems because the cloud provider api is the thing that's used to access your cloud environments and that's reachable over the internet to everyone the only thing that keeps you from getting breached on the cloud provider level is to keep your credentials secure that's why it's so important to really look at what kind of credentials are currently being used so we have to differentiate here between
human users and service accounts human user and we know you have a form or a login form you log in with a password maybe then you get in that's what at least enable s is supported actually you have console login with a password or if you need api programmatic access you have an access key both are long term credentials that do not expire and you don't have to rotate them which means that's a problem actually if you look into the publicly known security incidents of aws customers very often the initial act access vector for the attacker was a leaked access key like somebody committing it to public github repo so you have to somehow get rid of those
access keys or more generally long-term secrets so for humans there's no need to use it anymore because you can set up single sign-on the idea is you singles and on and then the human user can obtain a short lift token to access the cloud provider interfaces including the api and as we're talking about single sign-on we should obviously be using multi-factor authentication as well these days and particularly for the production environment environment i think it's worth investing in the physical authentication tokens as well like free to do which i think are very cost efficient now single sign-on is not going to work with service accounts right we need something else there i see quite often
that you have an application that's running on for example google cloud and needs access to the google cloud api that means your application is going to need a credential bible summits use those long term secrets just for that but you don't have to do that anymore because the better way is to obtain a short lived token from the cloud provider metadata service so in the case of for example aws if you're running applications inside kubernetes they have a feature called im rules for service accounts which means you sign an i role do your application that means the application can get a short-lived token from the cloud provider metadata service there is no long-term secret involved anymore those
tokens usually only have a lifetime order of magnitudes of hours it's the same on google cloud the feature there if they're using their managed kubernetes service is a workload identity which works basically the same we have application running inside google's managed kubernetes you assign a role to it and then you can obtain a short lived occur from a metadata service so that works very well if your application needs access to the cloud provided api of the same platform things a little bit more complicated for cross-cloud provider access like application running google cloud needs access to aws api or the other way around until maybe two years ago the only solution was you generate an access key
in aws you export it and basically import it into your application google cloud which means you have a long-term secret that you can leak again luckily you don't have to use it anymore because you can now use identity federation for service accounts so if i continue this example my application running google cloud obtains the token from google cloud as described here then this application starts calling over to aws and authenticates with the google cloud token which is accepted by aws and they will issue an amazon token for that now application can use that amazon duke to access the api and all of those tokens again short-lived and this works because on the aws side i
was set up saying hey if somebody shows up with a google token that was issued by google and is assigned a particular identity at google then this person can now get an aws token meaning that i've completely eliminated the need for long-term secrets on both aws and gcp on the same works with other cloud platforms as well which means means i have a two-story the risk of leaking long-term keys which is often a problem for many people or companies so that was about access keys or credentials another issue i see very often in the beginning people i mean you have to sign i am privileged right you have a token that's one thing but you also need to
define what can i actually access with the token and people often then do assign a lot of privileges right because it makes life easier for them so it makes a lot of sense to look at all those iron policies and basically see if you find these kind of situations where people like use white cards that means can do anything to cut them down so this is basically least privileged principle that you have to review there was the i am parameter the second parameter that's basically the same that we have in classical non-cloud on-premise networks which is the network parameter so what i have to look for is there two people actually have a network architecture meaning a public dmd set
and private zones where most stuff should be divided in the private zones and only the bare minimum should be in the public zoom if public is reachable to everyone and the second thing is firewall reviews that happens really often people start opening up ports to the internet internet means again do anyone and you have to look in particular for things like open sports usually when you open up an snh port with internet it takes only a few hours and people will start brute forcing you then if a password like centos sent to us then you will be breached now what i've said is we're gonna basically look at the output from the dual and go away manually right so today
everything might look fine maybe next week it's not fine anymore because somebody has already started opening up firewall rules or generated access keys so we cannot manually go over it every single day or even once a week because that's not going to scale again so we need some way of automatically monitoring our entire environment for this kind of security violations and the solution to that is called cloud security posture management which basically just a piece of software that's going to look into your cloud environments by calling the apis and checking if anything violates your predefined security policies like what i've said before right i you have access keysteer somebody has opened up the firewall rules somebody has a storage
system that is publicly reachable to anyone and there's different solutions for that how you can implement it option one is cloud provider service they have everyone supports that basically has a solution for that there are also four body offerings for that for example software service platforms basically you grant them read access to your cloud environment and they can monitor everything and then tell you or raise alerts if something goes wrong some also offer the tool itself so you just get the software package and you deploy and operate it inside your own environment just the advantage that you still have full control and only have to grant access to a third party to environment or third option open source tools that
they can use to build basically your own cloud security poster management cloud custodian has been around for many years i think it was one of the first tools and it's very popular i think with some people steampipe that i mentioned in the beginning for inventory you can also use that for cloud security posture management although at some point depending how large environment is you might run into scalability problems but it is this allows us to monitor everything automatically and once a violation is deducted you get an alert and you can run this maybe depending on the solution in almost real time or like once a day or once a week so next part sooner or later you're
going to have a security incident but it's minor majors different story but when you have to happen you have to start investigating and investigating means you have to go over your log files right because without that you can't really do a proper investigation what i found is the easiest way to do that to set up a central locator sync and to just dump all your log data in there just for storage at the beginning and the most cost efficient solution for that is use one of the storage services like amazon s3 and so on because it's easy to set them up you can drop as much data in there as you want because this will be kind of no limit and they're
very cost efficient as well you can use the seam but that can become very expensive depending on what kind of solution you are using because of the huge amount of data that you're dropping in there and the first kind of log data that we feed in there is going to be the cloud prior logs if you have a breach on a cloud provider level that's that means the attacker is going to call the cloud provider apis and those codes are going to show up in the management api logs that's why it's important that you activate forwarding these api logs into your central locator sync you make sure nobody can mess around with that so you
need access control on that now now we are just storing data right at some point when you have a security in this incident you might have to query this data and depending on how large your environment is you are talking here about covering gigabytes of data 10 gigabyte maybe even 100 gigabytes so you're going to need some infrastructure for that but as i said in the beginning our scenario is you're the first and only security engineer you don't have the time to maintain and operate infrastructure for that so my recommendation is here you go serverless you have this services that data lake varying services like aphenum or bigquery same should exist for azure that allow you that they're very easy to
set up they don't need any maintenance at all and you can clear them all in a relatively cost efficient way over your entire data basically the more data you clear the more you have to pay but if you don't vary that often it makes sense to do that it's not a lot of effort to set up right then incident response as i said when incident happens you need to be prepared and it's good that you have to do that in advance right how does your basic process look like you go in the new store need to store evidence somewhere you need this dedicated location for that makes sense to track your incidents in the issue tracking system and you need
to find in advance who's your contact person for what topic right operations you have to talk to them if customer data is affected you have to reach out to your customers but you're not going to do it yourself you need someone from customer support for that it's just important to define who's to connect for what topic in the beginning before something happens so by the time you have the incident you're ready to go and finally vulnerability management i think that's a very old topic it's not too different in cloud the only difference is we have vulnerability management on two levels we have to scan for vulnerabilities first on the container level which also includes our application dependencies
and on the host level meaning the host machines where our containers are being executed there are open source solutions available for that anchor and clear working very well primarily for containers host machines is a somewhat more complicated story i don't know any reliably available solution for that on the open source space for vulnerability scanning on the host machines and one more thing cicd based scanning makes a lot of sense yes but it's not sufficient right if i build a container image today as kenny today and he has no vulnerabilities this assessment might not longer be true next week so any continuous vulnerability scanning that sometimes some people forget about that and identifying vulnerabilities is one thing
fixing them is another story we have to write on slas for that with engineering and agree on something like one week to get a critical or high vulnerability fixed or something like that well that's already phase one this if you are busy standing but that will take you at least weeks maybe a few months depending on how much in company support you will get for that after they continue with phase two and then the first thing is defining the security baseline a secure engineer usually know what you should do right but that's usually not clear to engineering so it's very important to write it down define what are your expectations towards engineering for that and you don't have to come up with this
yourself there's a lot of good stuff out there in particular the cs benchmarks you have basically securely best practice not only for cloud providers but also for kubernetes for docker images and so on it makes sense to just adopt those standards once i've written down my baseline i need to set up the monitoring for that right to see if people are actually following my security baseline and this is where i can use the cloud security booster management because i have to implement my baseline basic in there to look for violations of my baseline then another easy win kubernetes or container deployments a lot of people when they run the application inside containers those applications are running as root
i've seen this so often i think it's a i don't know what's going on there but nobody's thinking about it there's a very easy way to override that in kubernetes you have a so-called security context object and there you have a parameter called runners use it allows you to override that you just specify your deployments no run this as an unprivileged user and then it's no longer running as root it's super easy to implement and there's a lot more security features and kubernetes that you can use finally or something that people sometimes miss if you have two applications running in the same kubernetes cluster there is no network isolation between those applications application a can always reach
application b in vias verizon that's just the waste vanilla kubernetes works the way to fix that is to use so-called kubernetes network policies which are usually never enabled by default you have to explicitly enable them and start deploying network policies which means basically it's a firewall rule that applies inside the kubernetes cluster then again as i said about using this run as user option how can you be sure that people are actually using that you have to again set up the monitoring for that that means you have to extend your cloud security booster management also monitor your kubernetes clusters to see what people are deploying there actually and how a very different topic so security and
cloud usually means you have a scalability problem that means you need to automate as much as possible for that you are usually going to need some tool right so for example i want to run a dynamic application security testing once a week i want to run some board scans maybe every day same with credential scanning maybe so i need some platform that allows me to define what tool am i using how i'm a qualities tool how i'm going to process the output and where do i push these outputs then i need the time sketcher for that this is what workflow orchestration engines are very useful for again the cloud provider services serverless allows you to do that there's
also two kubernetes frameworks for that which is algo and daktone i personally used a lot of our workflows but functionally it's almost the same as stackdown so it doesn't really make a difference or if you don't have that much secured animation of course you can just spin up a virtual machine with a crunch up on it that schedules everything a lot of this is reactive right it's how do we get things in more proactive mode this is not even cloud specific you have to hook into the software development life cycle that means i need to be removed in the design phase otherwise i will always be keep running after people get things done and yeah the bendest kind of obvious i
think right and that's phase two and then in theory i can continue with phrase three and four depending on what i want to do and so on and how to secure the bose js now now before wrapping up some final thoughts so our startup sme is doing business dealing with business customers and there are some customers that would say i really like your service offering but i will only do business with you if you have a security certification like iso 107000 or soc2 or something like that now i'm not going to get in topic if those certifications are going to improve your security posture but i think that's the wrong way of looking at you can use
those programs to actually push for your security again to get things done because if somebody asks you hey why do we have to do this you see well we need this for stock too and if you don't have stock 2 we will not win that customer so that will help you a lot because nobody is going to argue about bringing more revenue right so you've changed security of not being a business enabler which is making discussions a little bit easier in my experience yeah and finally that's my culture question how are you approaching security right my security is a very cross-functional role we have to deal with a lot of people right you talk to customers about
for incidents and as we're talking about product security we are dealing a lot with engineering you have to build a relationship with them and that means are you going to be guy who's just going to make trouble for them and go is going to blocking them all the time or are you more seen as a valuable support function because this can really influence a lot and how whether they are willing to work for fee or not right and before i wrap up i'm not the first person to think about how to do this in structured way there's a lot of very interesting material out there in terms of talks and also documents i highly recommend going for this if you are
interested in this topic well and that's it
thank you very much christian for the interesting talk i think the one key takeaway is that as a security person you should not try to be the blocker but more like a valuable resource that's a very good recommendation i think um are there any questions from the audience please step forward and um
hi thank you great talk i was wondering what is a good way to get more support by the engineering team and everybody in the company for getting a better security system that's a tough question well i think there was one of the slides that have a roof for time reasons i think you need management support if management doesn't care about security then things are going to be difficult and the other thing i think if you work with different teams there are also people who are for security and some people who don't care it's important to identify basically those people who are interested in security because those are going to be their allies and i think these are basic kind of
security champions that they can try to push thank you hey uh question about the last thing you you touched on on on security culture um yeah are they building relationships with other departments and not being a blocker sounds great until someone you hear that someone is trying to put an rdp password logon machine on a public ipv4 address so where's the there's you have the the one side where you everybody here at least thinks they have to put the foot down but then you you kind of go against that how do you keep balance yeah so there's an interesting talk from netflix and their security culture says trust but verify their idea is the fd monitoring
employees if you if one team screws up the money will show it up immediately and then they go those people they'll hey people you have to fix this and you have dashboards that show how well teams are doing they're kind of playing the teams against each other saying oh this team is really doing well you were more like down there maybe you should really do something this kind of an intrinsic motivation for other teams that works for netflix i don't know if that works in other places too all right thanks so actually regarding working with departments with our department from my experience it helps so this also knows with others when you accept that they have problems or issues
when they when you take them serious from my perspective they take also serious if you see that you're competent most of the time what what uh developers and engineers think okay someone comes around and he wants something but we have other problems so that's just a remark i have two questions um first you talked about setting up a perimeter there's nowadays this thing called zero trust model and there are a lot of guys are saying oh we everything has to be secure with zero trust what do you think about that one would be the first question yeah so i mean it's possible right zero trust i mean i'd say the thing is with uh with the parameter you kinda
so if you had said in this later phase you remove the parameter and do everything for example in vpn or separate content control and data plane then i would say okay for the first step parameters are good but when my experience especially still in cloud people are doing oh i'm inter in the trusted parameters so that's no problem anymore right well i mean i think it's not going to i'm not going to say this will still just remove the network pyramid you still have it right said in kubernetes inside this idea you are inside a trusted island then everything is good i think zero just for me is the opposite it means you means micro segmentation and
micro segmentation for me includes network parameter maybe on a smaller level but still it's there and i think the im is actually in the cloud is very easy to do serial rust for human users you can say you allowed you can set up imports that say you log the login but only monday to friday and only from a particular ip address range for example i think your cloud is maybe the easiest place to implement zero trust yeah the second question i was a little bit surprised that you did not talk when talking about to the developers about the supply chain stuff because especially i'm i'm more of an infrastructure guy myself but the thing is
uh with total with the packages situation today in our days i would say okay we have really look where software where the developers are they're getting their software from and their libraries so um in which phase would you say is that later on it depends okay so well i mean if you had problems or in this direction you should pull this into phase three at least obviously right i mean i mean availability management is companies a little bit but not completely obviously but i'd say i probably put it into phase four okay cool thanks over time a bit already is it a quick one or yeah maybe i can go cool thanks thanks again for the presentation so i'm
wondering so here you are starting with the assumption that you have a full access into the network and to own to the system visibility how much would it be different if you have a partial or a non-view you know if some part is outsourced or you're you're a part of the team that is actually doesn't have access to the entire system well if you have no view you have a problem right i mean you can still do the parameter right scan that but that's it if you have a partial view it depends i think a lot what do you have access to like do you have slippery access to copyright api or do you and
that's it or do you also have to look have access into host machines and things like that i think you should always go for acts as a security team because actually most club providers will give you a dedicated security read-only role just for that purpose it's called security auditor rule and you should always go for that i think if you don't have that then you can't do work i mean it's impossible cool great stuff um thank you very much christian thanks to the audience for the good questions [Applause] so now we have lunch enjoy lunch and there's also soft drinks in the fridge now if they're empty please approach the people from hilton from staff and
enjoy
okay everybody [Music] welcome back from the break please take your seats we're gonna start in a couple of seconds so talk after the break let me welcome dimitris brazakis for his talk attacking malware with ai have fun thank you thank you i have mike thank you hello everybody and welcome i hope that you have enjoyed your lunch break and that you're having an interesting and educational day so far i'm really glad and delighted to be here with you today at besides munich where we're going to have the chance to explore one of the places where the finest concepts of data science and cyber security meet before doing so i would like to really quickly introduce myself to you my name
is dimitris prasakis and i'm currently pursuing my masters in cyber security at georgia institute of technology georgia tech in the us i'm also working as a security engineer at trade republic in berlin where i specialize in cloud security and infrastructure security minders also lie in ai and privacy and especially where all these worlds intersect with each other and one of those intersections is what we're gonna see here today together our agenda is made out of four parts the first two of them are introductory we're going to talk about malware analysis and about sandboxes and we are also going to talk about machine learning and introduce the concept the tasks classification and clustering the third part of the talk is the heart of the
talk my your framework it's a malware analysis framework with ai and we're going to explain how it works step by step and we're also going to have a decision for questions and discussion so without further ado let's start and let's talk about malware analysis in cyber security there is a job title malware analysts and these people their day-to-day job is to analyze malware and they do this with the help of certain tools called sandboxes a sandbox in short is an isolated environment in the context that everything that happens in the sandbox can't escape it if we execute a program in that sandbox it can't infect the host right so that's the first thing it does
and the second thing it does it is that it has the ability to study and monitor the behavior of what is being executed inside of it so in short it takes a binary malware binary as an input it executes it studies it and then produces a text report and this report will contain things like network requests made or metadata about the malware but today we care in particular about commands this malware tried to execute and we're gonna talk about windows malware specifically and windows api calls apart from sandboxes the second introductory point is the task of classification classification is a machine learning concepts is the task of taking an observation something about we don't know the
category it belongs to right and putting it into some pre-known category into some class to be more formal so let's say that we download an executable from the internet right the task of classifying this executable to benign or malicious to good or bad is called classification and in machine learning we classify objects with the help of classification machine learning algorithms and one of the most known ones is k n and k nearest neighbors and it operates based on the folks saying show me your friends and i'm gonna tell you who you are or in other words show me your nearest neighbors in the vector space and based on their class i'm going to predict your category and
we see here the example where this observation of the gets classified as a benign program because the nearest neighbors are benign apart from classification we have other tasks in machine learning and one of them is clustering and unlike classification which is a supervised kind of learning because we feed our algorithms with labeled data so we fit the algorithm with data that have labels clustering is a task of is a kind of unsupervised learning we feed our algorithm with unlabeled data and the clustering algorithm tries to create groups of those unlabeled data clusters which we are going to equate today with the concept of classes let's see how all these three concepts sunboxes classification and clustering
bond together under a framework for malware analysis called my year but before doing so i would like to talk about a very interesting statistic every day almost half a million new malicious software and potentially unwanted applications are registered i would like you to think about the volume how much volume of malware this is we need to understand this malware we need to study it so that we can attack it right before it attacks us we can't do that manually we don't have enough manpower we don't have enough bandwidth right malware analysis is a very energy consuming process and it takes time there is therefore an imperative need to automate this process as much as possible
excuse me and this is where my ear framework comes into play my year framework is an open source framework for malware analysis with machine learning and it is used to do mainly two things to discover new clusters of malware in other words to take all these half a million malicious software every day and try to group them into clusters that contain malware with identical behavior and to also classify unknown malware observations into clusters that it created and we are going to see how this happens in a bit my ear framework looks like this a simplified version of it we talked about the first concepts right so we take all marginal binaries we pass them through sandbox produce some text
report and we talked about clustering classifications we want to cluster and classify classify those malware reports but in essence clustering and classification are statistical algorithms right they don't understand text files so this is where the second step embedding of behavior comes into play we take those text files and we transpose them into mathematical objects into vectors so that we can later classify them and cluster them there is also a step in the middle the step of prototype extraction is a very beautiful idea which we are going to talk about in a bit so without further ado here what you see is an xml encoded malware report and i modified a little bit for the sake of
the presentation and for understanding better what's going on we see three snippets of instructions the first one is something you're going to see many programs do the program tries to load the windows kernel into the process memory and also it reads the time nothing malicious about this snippet of commands right but if we scroll a little bit more down and we study the next snippet the next sequence of instructions we're gonna notice something at least suspicious the malware tries to copy itself from the location it is to a windows protected area called system32 and while doing so it also renames itself to a windows well known process cscs rss.exa and after doing this it also modifies the windows registry
keys so that it runs every time window windows start up so that's not something a benign program would do right why would you copy yourself into a protected area and then put yourself to startup you may do the latter but especially those commands in combination show some malicious behavior and we see another example another snippet where the malware just tries to check if it's being debugged and malicious programs usually do that excuse me to understand if they are being analyzed and in this case they try to shut down their malicious behavior and conceal themselves they try to hide themselves from the malware analyst there is an important observation to make to be made here from this example
that behavior is often manifested as a sequence of instructions which we formally call as q grams so a q gram is a synonym for a behavioral pattern let's say and we saw two examples of two grams because they are made out of two instructions and this concept of qgrams is important because it's what enables us to embed the text files into the vector space we want to do that first of all so that they can be read by the clustering classification elements but also because embedding the reports into vectors will enable us to express their behavior in a geometrical way right similar reports similar malware are going to be closer in the vector space and vice versa
we embed the behavior using a mathematical function called embedding function which looks like this we are going to explain in an intuitive level what this does simply it gets an input which is the malware report a sequence of instructions right and produces an output an n-dimensional binary valued vector where its dimension represents a cugram it represents a behavior for example imagine that we have the three-dimensional space right the x-dimension z-dimension the x-dimension and the y-dimension we may embed this behavior this kilogram in the x diamond in the z dimension excuse me and say for example that the malware tries to copy itself into the windows protected area rename itself and put itself on startup if the value here is
one or otherwise if it's zero the malware doesn't try to do it and we may embed this behavior the malware taking for debugger on let's say the x axis right if it's one the malware tries to do that if it's zero the malware doesn't try to do it imagine this for n dimensions in the n-dimensional space right something really interesting happens when we embed malware in the vector space like that because each dimension represents a behavior right similar reports will be closed with each other as we said and they will form dense clouds into the vector space if we look at the sp this space from far away we're going to see clouds clusters of malware and
each cluster has malware that behave in similar ways and this exact distribution is what we are gonna attack in order to exploit the malware before it does that to us before talking about that however let's dive into a very interesting topic of computer science computational complexity algorithms take time to be executed correct and especially machine learning algorithms scale in a super linear way let's say in polynomial time depending on how much data we give them that means that they are quite slow and we want to execute this framework every day so there is an imperative need to reduce the time of the framework as much as possible there are three things that we can do
either buy more memory and more cpu but this is limited right and by default we have some physical limits as well we don't have quantum computers for example so that's that option is out the second thing we can do is use faster algorithms but again machine learning algorithms are refined like they are as good as they may get we we may be able to modify them slightly depending on the program problem but mostly we can't really reduce how much time they take the last thing we can do and that we are going to do is reduce the data that we fit the framework so what if instead of feeding the framework with half a million reports every day
we fit the framework with let's say compress them with a ratio one to ten and we feed it with five fifty thousand reports this is where very interesting and beautiful idea from 1989 the idea of prototype extraction comes into play instead of analyzing all these initial model reports all these data points what if we could extract representatives what if we could choose certain data points that would represent all the others we call these data points prototypes and luckily enough there is a prototype extraction not the optimal one but a good enough one that runs in linear time so we take all our reports and we try to compress them into a small set of reports which we call prototypes
and instead of feeding our classification and clustering algorithms with the initial reports we're going to cluster the prototypes themselves and propagate the results of the algorithm back to the initial data points and see we see here we see the five reports right that we extracted before the five prototypes excuse me and they are being clustered into three clusters one on the top right one on the bottom left and one on the top left one would say it doesn't really match makes much much sense to create a cluster of one person right like we do on the top left why would we do that why would we create a group of one and we would agree with it with whoever
claimed that right and this is why the algorithm takes a parameter and we say that if a cluster has less than m members if our clustering algorithm creates a cluster with less let's say than two members we are going to reject this cluster we are going to put it on the site and we are going to keep these data points for future reference maybe tomorrow where we are going to have new malware reports some new prototype is going to be extracted on the top left and as a result maybe a new cluster will be formed because it's going to have more than m members and the same exact idea applies to classification we draw our decision boundary and we may
say that if a new observation falls here for example the closest prototype is this one as a result we're gonna classify our new observation into this cluster but if it falls more than dr distance away let's say here we reject it and we keep it for future reference we described the simplified version of malia we described how we embed malware into the vector space how we extract prototypes out of them and how we can use these prototypes to classify and cluster faster the malware what we didn't talk about however is how we can make this framework recurrent we said that it's going to be executed every day let's say and analyze half a million malware right
this is where a last concept comes into play the idea of prototype incremental analysis excuse me and the framework looks like this so the first steps are the same we take the molar binaries we monitor the behavior we embed the behavior and then instead of classifying the initial data points we classify the prototypes on the rejected prototypes from the classification we run the algorithms of prototype extraction and we extract prototypes out of them and then we feed these prototypes into the clustering algorithm and these new prototypes maybe will create new clusters with the yesterday's prototypes and so on what we described basically is this algorithm this is the heart of my ear this is the entry point the main
function of it and it invokes all the other three algorithms that we talked about and this way our model gets stronger and stronger every day our framework can create new clusters of malware and as a result it can identify new malware and in this way we can attack the malware before it attacks us the talk is based on a paper called automatic analysis of malware behavior using machine learning it's written i believe by two german professors and one austrian professor about 10 years ago but it's still relevant and it's very interesting if you enjoyed the talk i urge you to give it a read because it also has some concepts that we did not have the time
to talk about today i want to thank sneha for guiding me and helping me prepare for this talk today thank you sneha i really appreciate it and i also want to thank besides munich for giving me the chance to be here with you today and explore one of the places where the finest concepts of data science and malware analysis meet i would love to connect with you i have a qr code down here i hope it works it's not the malware i promise you don't have to believe me though right so you can search me by sorry right correct so yeah you can also search me by my name and i would love to hear your questions
and discuss with you about what we presented thank you [Applause] awesome thank you very much any questions just come in front here please and ask your questions
just come in front thanks so we can hear you online as well hello hi hello first of all thank you very much for the very impressive introduction of your system i have just one quick question about it where do you get all that information related to the malware from because i mean it's a huge amount of data you need to put into the system and are you do you have any sources or where do you get so from usually we get those that's a very nice question actually we get those from honeypots right we a honeypot is something like a sandbox we could say that we exposed the internet right and it has some vulnerabilities and attackers may get
um access to this honeypot and try to install some malware but they failed to do it because this is how hotbots work so we take this malware symbol and we throw it into my year so honeypots usually is the way how we gather those apart from that also self-reported malware from people or things that either virus has already detected let's say okay perfect thank you
any more questions
i will suppose it was clear then awesome thank you very much i'm sure you are available for q a later on yeah i would love to talk with you thank you very much [Applause]
no i'm going to the other one
[Music]
look okay everybody so take your seats please wonderful so the last talk here in the moonshot track for today and actually for the whole conference live from india ashika ashid and manan will talk about how to get into how they got into a private code database or code base and actually i love war stories so this is going to be very interesting and the stage is yours have fun thanks avio um so i'll begin uh hi everyone here here we are to present our talk titled how we got into a unicorn's private code base through analyzing millions of mobile apps moving on to the next slide i am ashika i am a security research and technical
writing in turn at cloud sec
i should i believe you're a mute
hello is it
uh hello everyone my name is rashid jain and i am working as a full stack engineer at cloudsick so i love to automate anything and everything related to security apart from that i love to travel and read books hi everyone i am anand and i work as a cyber security analyst intern at classic and i love to play the drums and play football yep cool so let's move on yeah so the agenda of our talk our talk is basically divided into three parts in the first part we are going to explore uh what exactly happened with the unicorn that we're talking about in the second part we will talk about how we analyze and find vulnerabilities in
mobile apps and in the end part three we are going to uh talk about a large scale study we did of over one million plus apps and the impact so moving on let's begin with part one the next slide please yeah so source code leaks have been in the talk in the news for many years now and the biggest one that comes of the top of my head is the recent twitch leak that happened in 2021 where we saw that malicious actors posted on the 4chan forum over 6000 internal git repositories of twitch which contained 200 gb worth of data and 3 million documents so just by the scale of these numbers you can understand how big this leak was
and earlier this year we did a little something of our own which we'd like to share with you guys so moving on let me present to you the big fish so it was a casual work afternoon for us we were just doing a normal thing analyzing top apps and finding security vulnerabilities in them and reporting them to the uh organizations so what we found was really interesting we found that there was this app which had 10 million plus downloads in play store of a 120 million worthy unicorn that had a really big security vulnerability with which we could uh basically view that entire source code and do a lot more uh a lot more things
with it which we'll tell you in in the coming slides moving on so let's explore what exactly happened so what happened was that uh while exploring the uh the gold base of this app we were analyzing it on uh this search engine that we built and what we saw was that there was something called as a guitar personal access token which was hard coded into the app's android bundle file and it was there for everyone to see and do anything with it so that's not exactly desirable especially not from a unicorn a lot of questions might be arising in your head people might be asking what is a github personal access token and what can a person do with it so we'll explore
that in the next slide so a github personal access token is basically an alternative uh to using passwords for authentication in github using the github api or the command line so uh with this token basically uh you you get a lot of privileges and we ran this query as you can see in the screenshot on this on the screen what happened was that we passed the github token that we found in the previous slide and we wanted to explore what scopes uh is this guitar personal access token giving us and you can see the output in the screenshot we could see that we're getting admin privileges we we could delete the repo we could delete packages in the repo
we could change the workflows we could write packages we had the reposco we had the user scope so out of all of these scopes the most important is the scope repo which gave us full access to the private repositories so uh just to give you an idea of what we could do with this scope was uh basically we could uh commit changes to the organization's github repo and push them we could also invite other collaborators uh and ask them to do the same we could mess around with their deployment workflows we could delete the repo basically in the end to sum it up we could do anything with the repos with the scope with just the scope and that's
a pretty big deal now moving on to the next slide uh after finding out uh that we could get access to the private repositories we wanted to see how many private repositories can we actually get access to so we run this query that is shown in the screenshot we passed the token against this uh endpoint and we were able to find that we could access the organization's all 26 private repository and that's the impact was pretty nasty so moving on uh to the next slide uh and this is what the private uh repositories urls look like so you could conclude that we could get access to the unicorns ios apps their apis and their normal android mobile
apps finally this is how we got access to the unicorns code base so after finding this issue we immediately reported to the company and they acknowledged the issue and fixed the bug and the token is now no more there moving on so let's count the mistakes so how are we able to do this entire thing there were two major mistakes that were on the developer's part first mistake was hard coding the github packet bad token in the source code in the first place so once we found that hard-coded github pad token we could clone the repos on a system and just view their code base so till now we are just able to view the code base
but where things went really south was mistake number two uh with which like they gave excessive scope to that token and anybody could use it for exploitation like anybody could commit changes to their private code base anybody could invite other collaborators and team elect change the entire organization structure and whatnot so these were two really big mistakes that they made on their part now uh the talk is titled how we analyzed uh how we got access to a unicorn's private code base by analyzing millions of apps so surely millions of and analyzing millions of us apps is like basically not it's something a person cannot do manually it's a herculean task to do manually so
we automated this process and we did a lot of analysis on our heart and i'd like to give it over to arshad my teammate uh who's going to explain about this part a little more in detail
why should i believe you are still on mute
hello
so till this point like many questions would be arising that how did you start scanning the code base of this app how did you reach to the secret at the first phase first place so let me tell you a story one day our security team out of curiosity were doing a study on secrets that can be found on github public reports and to our surprise we found like there were many secrets that people have just hard coded in the public airport like stripe keys razer pickies aws credentials and assets like firebase urls and aws cognito urls so this was like a big mistake that the developer have done on public reports so then we thought these are public reports
which anyone has access to so what about the source code that no one has access to like mobile apps so then after doing some initial research we did not find any good tool that can just like tell us about uh secrets that are being leaked across the mobile app so then for analyzing the mobile apps we build our own security searching so here are the steps that we followed to build this mobile app scanner so first was we collected the data second step was we decompiled the apps that we collected into readable source code third was we created a set of rules uh of rejects so to identify the hard coded tokens and the third was to
build a interface where we can just do a rejects or like keyword search and it will return you all the results on all the source code that we have collected so uh let's deep dive into all the steps that we followed to build the system so the first step was collector of mobile apps we thought can we build a system in in which people can just come upload their mobile apps and they can get their security reports uh and just know the secrets and issues in their mobile app so user submission was the first source that we collected the data from the second store source was we collected uh the android apps across the internet so that included uh apps that
were being downloaded from play store and as well as third party app stores so while we were crawling the play stores we faced many difficulties like sometimes we were not able to uh download apps from certain countries or sometimes there were issues like some apps were not compatible with certain devices but we were able to overcome all those uh problems and the reason we downloaded the apps from third party app stores was because we wanted to analyze the fake apps that have been wandering around on these third-party app stores because these apps on third-party app stores contains malwares or sometimes people have just tampered the certificate and uploaded the apps in the system so currently we have about a million app
that has been indexed in our system so after collecting the mobile app we thought uh let's like decompile those apps into a readable source code so to do that we used an open source android tool called zx so zx it helps you to decompile delve by code into java classes from apk and dex files it also helps you to decode android dot manifest dot android manifest xml file so after decompile all these apps we thought let's we stored all these apps into a file system so while we were storing these apps on file system these apps size is like uh app sites get increased uh due to uh like many things that are there in
these apps so we did some optimization on that pass apart as well so after uh decompilation we thought uh let's uh how do we find the secrets on these uh source code uh that we have so for that we started building uh rejects is uh that can be found on on the source code so for building those rejections we thought let's like maybe analyze uh some mobile apps and see what kind of uh keys usually do developer uses a developer use in the mobile apps so after this we started building the jigsys and there were many difficulties that we faced while building those rejections like sometimes the length of the key was not fixed other time we were not able to
even find the rejections for it so uh for those for which we did not find the rejections for we went to open source repositories like iphone and truffle after we had all the digixes in in a place uh we took all those uh rejections or we took all the keys and just put up all the keys in an android file uh so apart from putting it uh specifically in a class we put it all the keys in android dot manifest file java classes and then then after putting all those keys we com compile the android app and we put the android app through our system so that we can see whether we are able to detect the particular key or not
using our reject system and that that also help us minimizing the false quality so after collecting the apps decompiling it and building the rejects rejects for all the keys we went and we automated the process so we wanted to provide an interface where people can come just do a keyword or reject search and it will just list out all the results that we found using our source code so and this is how we were able to reach to the unicorns uh github token using the rejections that we built for get back token so this is the one you this is only the one use case that we found uh where using a github pack token you were able
to reach unicorn's private uh code base so imagine what can happen with uh all other keys that are hard coded in the source code so let me call manon on state he'll give you the insights of impact of these secret leaks and why do developer do such mistakes
well let's take a deep dive into github pids so we were able to find 159 private repositories from 151 github tokens that we found by analyzing mobile apps that had installs ranging from 100 to 10 million on the play store this may lead to leaking or secrets like database configurations which in turn lead to decline in brand confidence and imminent financial losses by the way all organizations involved with the above leads were informed of the same and corrective measures have been taken by them well we can lie but numbers definitely won't till now we've been able to find over 1.6 million hardcoded sensitive tokens as you can see in the pie chart some of the prominent ones are firebase
and aws readable writable buckets dropbox api keys facebook secrets etc now that we materialize things a bit by seeing the enormous number of leaks and moved ahead of the abstract part let's now take a delve into the impact of these keys and what malicious actors can do with them the best way to make you all feel the intensity would be able uh would be to take you through some examples well let's take up email automation first there are email automation services like send grid mail jim milgan etc let's analyze what happens when the secret token for one of these services gets leaked the attacker gets access to read and send emails from all the accounts
associated with that particular key using this key the attacker might be able to start a phishing campaign from the official mailing channels of that organization which which generally would be trusted by the customers or the consumer also the attacker might be able to gain sensitive and personally identifiable information like names emails contact numbers etc all those customers similarly if the attacker gets access to tokens for services like razor bay stripe etc which are payment processing tools they might be able to do very nasty stuff that might tarnish the reputation of the organization and further cause financial damage as well other than that they might be able to gain access to transaction details of the customer and organization like
credit and debit card information well now i think all of you would have gotten a gist of how frightful the consequences of such leaks are therefore it would be a good idea to talk about why developers do this well to be able to answer this question in a succinct manner we'll have to look at some common problems faced by android app developers we all know how much of a pain it is to focus on building security and crcd pipelines instead of building the actual application and have divert efforts other than that there is an issue of awareness many developers think that it is okay for them to leave hard-coded tokens in their source code as it is
gonna be compiled into an apk before publishing but as a cyber security community it is our job to make them aware that such is not the case also companies can sometimes feel that money is better spent in other domains rather than spending it on proper security testing on their mobile apps one example would be an app called clubhouse which became very popular but it didn't implement end-to-end encryption on their rtc packets thus any attacker could perform a man in the middle attack and snoop on private conversation well now that we've discussed the problem let's go through some of the solutions for the developers scoping is the most primitive method of stepping up the security of your app you
can do this by assigning the secret keys only the necessary permissions so that even if the attackers get access to that key the damage is controlled and limited using of environment variables to store keys in another solution so that they are not hard-coded in the code but rather embedded into the operating system and relatively out of the reach from the attackers other than that making use of git hooks like husky to prevent yourself or anyone in your team to even push secrets by mistake also you can make use of walls like hashicorp world to safely store all of your secrets most important of all is building a very robust security pipeline so that your application is secure from development
till the time you publish it now i would like to discuss some of our future plans till now our research was more centered towards android applications but in the future we aim to expand our scope to include client-side javascript and ios applications as well lastly i would like to mention uh a tool which is the visual ocean cli developed by us so that the community can leverage the asset data extracted after analyzing millions of mobile apps if you want to try out for yourself and see the security score of any mobile app currently installed on your phone we would encourage you to go to bvigil.com and search for the app there as you all know that the first steps
towards change is awareness with that thought we invite any questions that the audience might have
[Applause] okay wonderful thank you very much very interesting talk any questions just come in front here and ask your question they should be able to hear you
they are far away they can't bite you
i know everything about the project
okay so still thank you very much for the question for the talk and do you have any contact information or there should be contact information on the website if you have any additional questions thanks a lot and enjoy the rest of the day
you