← All talks

Gary Robinson & Yan Huang - The Path to Self-Securing Software

BSides Belfast · 201733:5244 viewsPublished 2017-10Watch on YouTube ↗
Speakers
Tags
StyleTalk
About this talk
BSides Belfast 2017
Show transcript [en]

around the area about the case of security especially in the area of the code and using the code to try and drive some of that I just want to get a feel for the audience here could this give me a quick show of hands hands up here who would be from the testing or penetration testing backgrounds caning up too many high many would have more of a development or coding background all right right great I'm harmony would be say management or risk management backgrounds and put the multi haven't put your hands up what the hell do you do so we're gonna talk about this research moved on I'm going to give a couple provisos first I'm going to

talk about and then yawns gonna do the technical patent and gonna come back home and talk about some of the future research this was driven by something that you let's get started with cease its night less is company I Ron we do services pan trust in had testing a trillion something out there but we also run a product this is not going to be a product peasant ation we're very carefully made sure there's nothing in here about the product is all about the technology in the background so some things there we might do some things we won't do we're not going to tell you what I just want to talk about the actual technology so let me just start

the timer here so myself I'm find a director of a company called Alaska most so involved no wasp as a board member sup they also used to be a security architect at Citigroup and not saxy where some of this impetus or ideas came from because they saw when you're working in a large bank that not the skill of cybersecurity you could triple quadruple the number of security people in that company and still you're going to have a hard time scaling it to the whole a software organization yeah and Johnson C said he's been images himself going against his bit so software development has accelerated and improved and every other area apart from cybersecurity as a developer you really

feel that we've got continues integrations your Jenkins your DevOps we have agile we have all these ways to developing software a lot faster than we did whenever I started when I started he had a C++ immune function and away you go that's up here he started with the main function that's a small number so it's nice you sticking Apache you stick in something else and you build on top of that you have four or five lines of code running a whole application so with deafening sound everything but cybersecurity trans to cure all that still takes weeks and months so no types say that the average time for a high-risk issue to be fixed from the time of security tests is six

months sober till a long delay there are security issues being in a software and then Stuber the fact application security industry already knows how to fix every issue cross-site scripting SQL and checks we already know how to do those things yet we still see a lot of those issues interestingly as a parent and this is a kind of a slight aside but you kind of get really pissed off whenever you see these things happening whenever we already know how to fix that anytime you see those issues come along the news and you know we know how to fix that that's 20 years ago night as a parent my kids are 10 around their nine I'm thinking in 10 years time they're

gonna be driving cars run by this stuff they're going to be put in their details and their bank accounts would ever probably built by the lowest bidder by a coder who's the cheapest one they could find so we want to try to make sure we're able to secure these things easier and with me the issue is actually execution it's not under once we know how to do SQL injection issue as Nick insurance covered everywhere it needs to be so where we want to go is we already know there's not enough people to do it we already know the skills do or do not exist in the industry we want a night put here in the left the right hand side

that is secure software that is a realtor goal and that's what we do right now with the security testing we do or the scanning whatever now if we want to automate this and not ticketed there such a manual task we have to see what inputs can be used to produce that security but now interestingly I knew of a colleague in the past who under left hand side here by the documentation and the bottom picture is just code as inputs we could use to try and understand what security Minatare apply because any pen tester knows the security information but have to know how they apply that to the particular application that's being tested my colleague of mine had done a lot of work

he says number years ago on tryna understand security documentation he wanted to try and analyze that and see if he could work code automatically the the attack surface of the application from the documentation now to me that's in seen a it doesn't bloody exist half the time be it doesn't bother exist most of the time and see it's usually not very specific I know even though whenever we do fantastic for customers and you get that to security the documentation of what the application is it's a quick diagram and you don't have much else to work from so using the documentation it's very hard plus the documentation is written in English language which is hard to parse

a hard to understand for programs a much better place to start is the code because the code is the truth the code is saying I have a web a servlet here I store this in the database and we could use that with especially with things like a code Jeff and environments etc to really understand what is in the application based on the code and that's what the the focus was of our research here so as they say in the adverts this is the science e bits and go to pass it over to the expert yan it's going to explain how they got involved what they did and their how they built a scanner to understand from the code what the

overall attack surface of the application is automatically solar energy CDI yeah hi my name is Yvonne I'm a senior engineer from the cset that is Center for secure information technologies which belongs to the continued versity Belfast and cease it is the UK's national innovation and knowledge center for cybersecurity and a recently suceed launched a program called a sister lab which is the incubator program which designed to support new Stuber security ventures to star scale and engineer MVP Minimum Viable Product so the successful candidates of the CC lab where we provide three months fully funded engineer support for the production prototype or concept technologies so for you know you can go to the city lab in a website to find

more information because next round system application will start in next month and you Alaska is you know one of successful applicants of the city lab so so the first time when Gary asked me to develop a source code scanner that is able to find out web page information from the java source code civil aid as well as the post and get parameters the first in a pop up in my mind is to use regular expression to such certain patterns so I did some research so then I found out of course regular expression can pose an irregular language and according to the Commerce key hierarchy there are four types of the you know drama for programming language and from top to

bottom is type zero to type stories and you know regular language is type 3 grammar and however the java programming language it's a table 0 language so which is inclusively in numerical in our language so it's impossible to pass the you know like infinity native state structure of the java language by using regular expression so that's why i you know need to find out another approach so this is you know I look at another you know language puzzle and this Java pasa is the first person I used for this you know Alaska project Java Vata is a open source library which allow you to interact with the Java source code as the Java object representation in a Java environment so

here we refer to Java object representation as the abstract syntax tree so as you can see no I'll give you a very simple examples and here we have [Music] just Java class so a few times this job class just you know bring out the terminal time so you use the Java passer you can pass this in our source code to a syntax tree as you can see the root of this tree is called a compilation unit okay and this and note represent the whole source code then it got you know three children nodes one for the package declaration one for the you know import and one for the class declaration okay this is you know what you can use old is

not their whole trees considered this in our triangle notation you see and this indicate this potion has been summarized actually it can't you know further in a branch out so let's take example I just you know choose one note master deceleration I just you know make it further elaborated so you can find in a fan out what you know written type of this method what's the name of this method as well as you know the parameters and the block statement again you know the last two notes can be further elaborated so after you use the pasta - pasta sauce clarity then what you can do is you know this Java pasa implements with the pattern okay so you don't need to

traverse the whole trees you're all in focus on what you are interested in then define the vista then open the note your interesting so after you can you know analyze this tree and try to modify the the note you can also generate a new source code so this modified tree you know actually I love this you know Java puzzle is easy to use and it's well you know documented but later on you know Gary asked me to do a brace similar scenes folder in a shoe shop in our source code because that Java fossa is a limb for the Java language yeah so then this is Angela you know comes out this antler in our stands for another two for

the language in recognition is also a no open source library so in this antler cannot owling past she shop code actually it can pass Alan you know language as long as you have the inner grammar of that language so we use this Angela you know compared to the Java puzzle there are little bit more unity to so peaceful you need you know pass the cheese shop in aggramar to the Angela Angela we're generated the sheets of pasta okay so this just in a one-time execution you don't need to do you know every time so after you get this in a sheet of pasta so then you know the next step are very similar to use the to the java pasta okay and

you know actually Andrea have you know to working mechanism one is sent as the Java passer used the visitor another one is called you know listener so now I'm going to show you a very simple examples how to use this listener so as you can see we have a very simple you know statement a signed statement okay that which is assigned value 100 to the variable SP okay so you can see you know the the top is the syntax tree for this assigned statement so how these you know listener works so under the first were you know generate a bunch of API so when this you know she would traverse the tree so for example

when this tree work incant the note assign so first you know there Massa Dakota enter sign maybe you know invoke and also were posted to the assigned contacts object were passed to it okay after that you know this tree worker well you know Traverse all the children note of this assign note so after you know the exit sign method were being invoked so what you need to do is you know because the antler already provided you know the API so you only you know need to overwrite you know there are certain methods you are interested in so as you can see in the right top of the figures you can you know it shows the complete

sequence course multitude is in a listener for this assign statement okay and you know this is all about you know language pasa so now let's back to the Alaska you know project so hot I used this pasta - pasta in our source code for example the Java source code or C shop source code then I will store all the output in a JSON file then the Alaska way use it you know for the further development ok so yeah that's me yeah so thanks young thanks and it's just to mention your sister Laos is actually a great program for in within three months they were able to analyze the different ways of doing this for us build a

program the data and also test as for us so now we have a facility that just sit the source code of the directory at a tool at the command line I run it it does lots of fancy stuff comes back to me with a JSON output which is all the web interfaces of that code line what parameters they take in and get in post nothing's got and that's just as part of bein you extend that out anywhere else you imagine you could look what files are opened what databases connections and all that other stuff so just lots of interesting stuff they've been able to do here which has been great and then we said before we're trying to automate

here we're trying to actually take a lot of the tasks of a human and make it easier so with the assumption then we took this forward we have a complete project while we can actually scan a code line and understand the interfaces and all the things we want to base in the commander's during what can we then do to try and get towards this area as the title of the talk is all self securing code what could we do there essentially what can do is take this project model and understand that as a representation anybody here is a pen tester and you'll understand that that's just what you read from the document to you this is

what you try to understand your first days or weeks of working on a noose or a pen test because you have all you have to understand what the darn thing does is it taken credit card numbers are taken bacon recipes how much do we care about the different things so imagine be able to get this project model digitized and hold it under ourselves I just want to note there that everything I worked on was open source there's nobody there all that can be replicated by anybody else in this room in fact one of the reasons for yarn talking here was also the last you know what worked for him and what didn't so if you're going to do the same thing

yourselves you can do it a bit faster based off what yarn was researching so we have a party model not tells us is it the web application as a desktop what does it do you could then take a security model which if anybody here works in the bank or even anything during 27001 you could get a security model both the security of India apply is their encryption needed on certain types of data what data levels are others are sensitive is there public is there informational you could also then say well if something's going in Shawn's of something sitting for a network interface does not have to be encrypted just have to be protected with certain headers so imagine the

technicality of the security model probably what would sits in your secure coding guidelines or your security standards and if you digitize that and then combine the two with some sort of engine you then have the ability to actually automate a lot of the security actions that need to be done you won't do everything but what we get 20% 50% 80% have it done can we cover up a lot of the standard SQL injections and they're crossing strip 10 injections that lana lies the pen tester to do more interesting stuff in the four weeks they have allocated or come and give something to the people who can't afford penetration test and to provide a level of security for themselves whose

remember I used to I worked for Citigroup obviously one of the biggest banks in the world and they saw the security organization they had very interesting then moving to form one startup and that startup community who are struggling to even get a developer to write something in four lines that is a website for them I come along and say what is the secure and I won't save what they say back to me because but it's not polite because they don't care they're disconcerting to get some functionality they can't afford a 20k pen test until they're successful until they actually get to the point where they're gonna sell grbs our sales at Citigroup or whatever and then that banks come back

and say well here's the big spreadsheet we assume of the security you have in your system and they go oh dear I don't have any of that and then they have to run around to get a pant ass and put the security back in but the problem with that is you're wastin weeks the months to go back and recode what you've already done to make it secure be nested you actually have that heads up there's a company called denim grip will be a store in San Diego do similar things us but they were pretty interesting thing would they check every check in and get and run something like this or to say what was that new interface have they put in a

new service or let's be very aware of that right now and do security stuff without so instantly your knowing what security is being applied by the developers so it's pretty interesting way of going forward but what can we do with this here going for in a perfect world very simply we could create our own documentation javadocs does this to the extent but javadocs is usually reliant on you happen to actually fill a tight it gets that a deep whenever block Alexis Govan and it doesn't cover up something everything here we can actually use the engine to even simply enumerate what the weapon two pieces are you could even start adding an information with the roles that use

those web interfaces you could have extra stuff Excel annotations declare the sensitivity of the data this CC under court under scorecard header or post parameters coming in is that's sensitive information if it is sensitive information what does that mean to us with M equal to security document the security policy do you understand that so you can simply do a documentation you get to describe data plus there's totally some programs out there to do some things things out there but we can understand that from a security point of view sensitive data comes in from the website is that external as internal is that stored the databases are sent off to another application if it is can we

model the other application and have a chiyan actually through our organization where we know the data flow and the sensitivities and if we have a chain of four applications and this development team thought CC underscore number was actually public data we can recognize a holder that's going to be an issue because we're not protecting that in the third application so there's other things you can do just by modeling the actual code and using the code is the truth to understand what the application does speed that up slightly more difficult things we can tell you is combined by the security module and actually generate code if I said it mostly developers here if I said gee we have a security policy that

says credit card numbers are financial information under security standard says financial information has to be encrypted with aes-256 well most developers and leave us a look at piece of code see the credit card number coming in and know pretty quickly if it's going to be able to be interrupted or not or was just being directly stored in the database now if we can identify that quite quickly can the code generation engine actually then generate a library open SSL or you reuse and spring security or something else do you actually then say well here's a jar and here's a dollar so that we can plug in to do that now you get a punishes of skill there because if you're saying

every credit card number every financial data has to be cryptid with aes-256 you can plug that out throughout your whole organization your whole back piece and then later on if you say well hold on it has to be a yes five twelve nine one place we changed that and all the financial data has not been encrypted with the right information simplistic put us a way of moving forward making a lot easier I do know from previous experiences I won't say which particular customers companies but security sessions have not gone through security changes have not gone through because they just realized it was gonna cost too much to manually go in change the code analyze both systems our need to have

the code change you just said I lost too much we're not change some real character character passwords to ten character passwords because we have done the analysis and it cost too much so security is being hurt by the cost of how we do it if we manage to automate a lot of this bring down the cost we can do security changes a lot easier and a lot more efficiently security configuration there's a company up in America has a patent on this particular bit which says I don't really understand where my network is and I'm going to automatically create the rolls and the bar walls or the IDS's of the network components and change the configurations

to say I know I need to protect that because I know they're sensitive information coming in here and by the way oh gosh there's tree it's an acquisitions data to an end lovely box there so that make sure the network around there is not available to the whole of our massive organization that we can tie that down a lot easier the dirt therapy have a model and brought up here by fire the architecture is actually looking security test generation can be plugged in with things like SQL ma with things like burp come yet to even create our own a Python or C++ whatever libraries that allows us to test because we knew from the project model there is a enter

PS called slash add credit card now if we know they're going to install that on Jenkins that night on this box here with a DNS address we could automatically create a function or a tool that sends the requests in and says when I'm going to test that if I fight pretend of crack credit card create if I put in an Impala credit card fine and how does the encryption work that's a lot of companies are moving towards that sort of automation but they're manually creating the automation which is very slow and very liberal very costly we can automate the creation of the security tests security becomes a lot easier again there's also a skill to that'swell

I do remember work in a previous company where we had a lot of awesome automation tests and we were getting about 60% of them were passing and then something went wrong I went down to 20% with our Facebook the build what happened it turned out something went wrong with the installation the belt so the question that doesn't become how do we do it from sexy tender twenty hundred twenty percent of tests pass whenever the during application wasn't even installed in the test box so writing tests is actually not good enough system to do so it ought to being out there and doing it once and repeat nikki is a lot easier way to do that and

then two clocks of what we're saying and this is is it paying this guy is it something we could try and achieve into the smaller level but you have even think it's that guy des and Joe not Jenkins story it clips you know if you want to have scheduling Sanders it can automatically create a code for you and not everybody clear the server code for you that sort of stuff so we've already we've already got things that can create a petticoat for us like back to what I was saying before if we know a credit card numbers coming in office web service and we know what certain a certain level security has to be applied

and we know we can create library which can apply that security and give them an API to do that can we automatically apply them the code code could be automatically create the library during the build put it into the jars that are needed for that or the S or whatever and actually write in just have to be read in the credit card number the encrypt or the tax credit card function and therefore automatically have this done a lot smoother and a lot more automated work is a credit card number very simple example is that gets stretched straight whenever you try to do all the harder things like headers or you're actually talking about stuff you can store the

data bs how do we check that's maybe encrypted in the right way or whatever but that's something we're going to try and research and see if and how much of the security coding mice we can actually automate just from somebody putting in the code so we're even think if we put in the IDE we could put a plug-in into Eclipse that recognizes you're taking in something as soon as you say this post parameter is sensitive it automatic send the code protected you know it's not possible we haven't tried that one of the reasons why we haven't tried darkness actually create to have a an audience here full of coders it's because in my experience in there this is only anecdotal a lot of

people here in the security world of the testing world our pen testing world aren't developers the person I mentioned before he was trying to use the documentation strand reads out there it's because he'd read something that they knew had a parts documentation he different considered doing the coop because he wasn't the coder but we're all coders we all know with the syntax and the controls they have slightly different from some of the other cool counters the cofounders come from directly from the code used the syntax and semantics to say you go to SQL injection or something like that it doesn't care they don't know if a sensitive information or not they don't know what encryption suppose be applied

there are issues in syntax and semantics to say SQL injection or this exception gets printed to a log and that's why they have so many false positives or even so so much the false positives I said actually there is in the issues that don't care that's the other thing I have the probable source counters here it in some different or not going directly from the code to an issue we're going from the code to your project model we can understand the project we're simply saying it's a web service it takes in the credit card number and then we use the security engine and the security model to try and combine those two things and actually make some

interesting right cuts so that is something we are playing with something we're going to look out there in the line and one of the good things with showing this to people that like yourselves is if you have any input if you've any experienced in this year you've seen something else it doesn't see him or if you've tried this here and you see something up so that the old won't work or it would definitely work for that please give us your feedback let us know what you think and so we can try and make sure we're putting that into our development going forward okay thank you very much thank you [Music] does anybody have any questions for

myself Oregon yes

yep so get your mate night oh you having the code is a name of a variable and you haven't a clue what that is now if they do it CC underscore number that's quite easy they could call ABC and you haven't a clue what the hell that is so the question is there's a couple of other ways you could do that one is have an annotation of bobberts saying this is financial data I know you know that or you could actually it's better I don't think it's really take off but you'd actually have a naming convention you know underscore and then at the end of everything I don't think that would work as well I

think the annotation would be better because we were flexible but you would have to elaborate some of your code for that but because if you've talked to people like pivotal another large because they got there and they're moving towards that infrastructure is code anyway so they're looking to keep everything in there so they'd actually love to keep all the security information in the code because then if the whole team disappears everybody's coming in says oh I know any what this does because I knew this is financial data one of the biggest problems I've had in my time of either being a security architect or a pen tester is Tom to people in the team we just don't

know the software they're in the team for months and they didn't work on that bit and they didn't realize there was an API over here the had a web service that allows you to put in the SQL you want to be execute in the database that's one of the things we have problem yes

yes

absolutely and not as a challenge absolutely in large organizations on the small ones that just don't do this right now it's that are having to worry but if you get to your point you're saying well you don't have and they don't have the annotations are anything out right now but here is the security task you have in front of you on the end of the day you're going to have a pen test document or something they got there after three months or whatever if they then say well hold on the pipe and put the annotations in that speeds everything up and get the documentation quicker and I could then get the pen test or whatever quicker

that's a selling point it's not something you have to chew the value but you're very right it's not there right now and it's something that's going to have to be encouraged to change and you're going to have to show people that by putting that annotation in there it seems you three perks of work it's three weeks of work later on and I'll actually then becomes maybe a could become part of the coding standard but you're very right that's something that that needs to be challenged and encouraged and probably driven by management in terms of the security angle you're a right yes please well a timer and so yes and we have trial websites we have set this up to do

this here and also and we've got some code some open source web sites just to make sure it worked and I don't too much detail the product that's belts off some of this here is already in trials across the UK and so we're just running against out there absolutely a lot of times it works fine sometimes it builds miserably that's just part of the development and company we came across actually use the Java X web server Jack's web service annotation we don't cover that right now but we know how to do it just to develop an organized it's better than exercised it gets that point and now it's a given part of the research in Jana see sets

abs were able to do did let you know how to do I don't get to that point freakin extender any other questions

Warburton over here is mainly the scanner to be able to give me that project model and not saying the last besides I said you could do whatever you want so my product does some things around there doesn't do although about there some of the risk and people have talked to so people here are maybe contractors in terms of risk and soar GRC risk compliance they see angle saying well if I get if I understand even at a project level ha if they're taken in financial information even that level if you start across a cigarette enterprise and you've got ten thousand projects ten thousand applications you could have everybody run a scanner I guess here and then it

meets you think you know all right we know about services we have have high-class financial information so they see that sort of angle as well so I'm very talk about the source scanner here also even to be think about it at a very high level you can say check what libraries we have are we using a certain light we know and I knew a Saturday at our we use and I know hardly is used so many times and I hit even mention in it but you could say are we using the person has hard plate and I we use that against baking website or a financial services website in them you know your risk so they love to know the starts of

their risk and lover have a pie chart saying what is the risk of your brawling like organization this because it's helped through that a key enough working companies where large companies where the sh1t has hit the fan and one way or another look at a passage dots problem you take weeks to understand even what applications you have used in that version of patchy strokes and the remained going to remediate that so again this sort of scanner could make that Automator