
thank you thank you very much for coming to this session and good morning everyone my name is Elisa I am an academic researchers at King's College and today I have the pleasure to introduce you this group of enthusiastic recent graduates at King's College London so I thought this group a quantitative method for two semesters and I was looking for sure I was looking for uh interesting ways for them to learn data analysis but also to familiarize with the fascinating world of software security so we have the chance to partner with one of the leading company in the sector that you might know veracood who was kind enough to give us their data on software vulnerabilities found by their platform
in their customers applications and so today we are going to present the results of this analysis on pattern Trends and practices in software security and we are very happy to have the chance to present the results of this analysis here at beside London and this was also an example of Academia industry collaboration that we hope to and we hope to inspire that there are many more so without further Ado I leave you to the this very exciting group [Applause] so hello everyone I'll start you off with a brief description uh of the setting and the data we analyzed so imagine you're the sister of a company and your goal is to have a secure application
to enforce regulations and to enforce your own requirements about the application through a security policy you can think of the security policies as the types of flaws which are not allowed to be found in an application your team of developers will then upload the application to be scanned for Flaws in the cloud environment that is offered by Vera code varicose will generate a report and on the report you will be able to see the list of the applications of their flaws excuse me the flow types as well as the severity score for each flow and the whether or not it breaks the security policy that you have set then as the CSO and with your team of
developers you're able to decide whether or not to fix a flow or not later we will see how different flaws can be addressed um afterwards you can also upload the application and have it scanned again and the cycle repeats
um so before we start with the analysis here we have a quick summary of the data set we work with so we had available data that covered 2587 customers across 24 distinct Industries over the period of first of January 2019 to 25th of January 22. and 95 of these customers performed scans During the period and we had more than 4 million scans and 147 000 applications and this data comes from one single vendor but as you're gonna see it is quite comprehensive as a data set it covers many flood types industry sectors and so on and we mainly focus on detecting patterns over time and differences across different Industries and to give you a quick look of like how
comprehensive the data set is you can see two histograms of the number of customers by Industries and the number of applications by Industries
okay so to begin with we AK Behavior across Industries which are denoted by the colorful bars as you have seen so first up we looked at the mean number of scans per application for applications with at least one scan this is calculated by taking the mean number of scans per per application divided by the total number of applications in given industry so in this graph the black lines denote the 95 confidence intervals which denote the Precision of the estimate Industries with more data have a smaller have a smaller confidence interval and therefore the estimates are more precise what we see here is a large variation in data so on the left hand side of the
graph you see the industry's manufacturing travel Recreation and Leisure retail and government which are industries that scan more often the median is also depicted by the dotted lines which we have plotted to detect specific outliers in our data the median also shows a generally decreasing trend from left to right the second way of analyzing industry variation and scanning patterns is looking at how regularly scans take place we measure this by scan Cadence which is the average number of days in between scans this shows where the scans are spaced regularly across time which would give for instance different security implications as you can see on the graph Industries on the left scan less regularly and Industries on the right
scan drop more regularly since the y-axis is the mean number of days in between scans just like the previous graph the median and the 95 confidence intervals are plotted as well as you can see a large number of Industries show average time in between scans of 5 to 15 days as depicted by the middle portion of the graph the overall average time between scans across all Industries is 12.14 days this may be related to the increasing use of the agile methodology so the two measures are actually correlated we found that industries that scan more frequently also tend to scan more regularly as what we would expect this is seen by the Industries on the right hand side of this graph
corresponding to those on the left hand side of the previous graph these include retail government and travel Recreation and Leisure
next we decided to look at the behavior of scans across time we were particularly interested in cyclic encyclicality that is how the number of scans uh vary throughout the year we were also very interested in how scanned how scan Behavior changed after important dates firstly we decided to normalize the amount of scans by the amount of customers so we could see if this increase in scans was either due to people scanning more or an increase in the number of customers in varicose customer base even though we noticed an increase of customers throughout time at a regular speed we also we saw that the increase in the amount of scans per customer was was bigger than the trend
in the increase of uh in the increase of number of customers we also noted that there was a quite a lot of cyclicality especially uh in December as you can see drops uh every year this is probably due to a fewer amount of scans as people uh uh go home over the holidays um we also noticed a big um uh a big increase in the year of 2021 uh breaking the trend uh particularly when we focus on 2021 we noticed that there was a very big spike after the log for Shell incident um this uh to give you an idea of how big this Spike was it was an increase in a round of five scans per existing
customer before that the max increase in uh the same amount of weeks was only 3.5 um amount of scans per customer so it was a very um a very big spiking lead passing the word over to General perfect
Industries and what we can see from the graph is that the most frequent vulnerabilities are cross-site scripting crlf injection and information leakage and this was a calculation where we were very concerned about the representativeness of our data because we only have very code customers we decided to normalize the numbers so we have this little calculation on the right hand side where we divided the number of flaws in an industry by the number of applications in that industry so when you look at the numbers on the bars that is the average number of that flaw in the specific industry so looking at the um 52 of cross-site scripting and flaws in retail and Hospitality that means on average there
can be 52 flaws cross-site scripting in in return Hospitality however twice as many can be seen in the technology sector so on average 109 per application and looking at General and patterns we can see that the technology industry experiences the highest number of laws per application across the three most frequent um flaws so they have a lot of cross-site scripting flaws per application also crlf injection and information leakage problems and interestingly the government also has problems with cross-site scripting and they have on average 89 flaws per application however Less on crlf injection and information leakage and next we were then interested in going into which Industries have the most severe flaws and by to do that we
calculated a regression where the outcome variable is the number of severe flaws and the whole calculation was based on the application Level again um what we can see is that um in the red box you can see the the um coefficients for the different Industries so we can see that in general there is a trend of of increasing numbers of severe flaws within that red box especially software and internet industry and have a high number of laws and just to tell you how exactly that was calculated we decided to compare everything to the Baseline financial services so the numbers you can see is um for example software and internet in 2019 they had on average
122 flaws more per application than um the financial services industry um what's also is interesting we controlled for other factors that could potentially influence the number of severe flaws such as business criticality or whether the customer requested a consultation and customer age and also the programming language and interestingly the criticality of the application had no significant effects on the number of severe flaws one might expect that more business critical applications are actually more secure however that doesn't seem to be the case um in terms of the programming languages we can see a very high variation however we didn't go more in detail on that um however in the next step we were more concerned about industry maturity in
general so which Industries are most software security mature or least software security mature and also what really drives that maturity so again we did a regression and to really determine which Industries software security mature so kind of optimize their way of detecting flaws we have um a binary classification so we distinguish between clients who have an API account so who scan automatically versus clients who have a manual account so we assume those who scan automatically are more mature and the binary classification also takes into account other variables that could potentially affect the majorities such as time variation the applications age also having requests today consultation and business criticality and again the programming language what we can see when we look at the
numbers the industries with positive coefficients especially the ones in the upper green um box are most strongly correlated with having an API account so government travel Recreation and Leisure education retail and financial services seem to be more software secure mature in that sense and however on the other hand those Industries with a negative coefficient such as telecommunications wholesale distribution and federal government and partner and companies of varicode and seem to be less correlated with having an API account so least software security mature looking at the control variables interestingly we see a positive time Trend so over the years um we see increasing software security maturity approximately there is an increase in API account use of one
percent in 12 months um however the applications age having requested a consultation with an expert or the business criticality don't influence how mature you you are again interestingly programming language again has an impact on the maturity or or um yeah how optimized you are in terms of um scanning and software security maturity however those effects really vary sometimes they predicted less or more API accounts with that I now hand over to Alan hi um next up we look into which flaws get fixed and what drives the decision to fix them and we use two logistic regression model here in the first model we have modification in the code such that the outcome variable equals one if
the flow is fixed and the outcome variable equals zero if the flow is left unresolved we look into a number of independent variables that we believe are key drivers to the decision of whether or not to fixing the flows one intuitive driver we may look like look at is the floor severity in the coefficient column and this tells us whether a flow with a certain severity level is more or less likely to get fixed relative to our control group so in flaws with informational severity level and if the coefficient is positive the flaw is more likely to get fixed if it's negative the flow is less likely to get fixed here all the coefficients are
positive so we could say that at all level of severity increase the likelihood of a flow getting fixed this is quite intuitive but as you may see the higher severity level doesn't mean it's making the it's the one the most likely to get fixed and this may be because of the difficulties in fixing the flaws which we do not observe here in the data and in the second panel we consider forcing application that we we see as business critical and for medium business criticality and above um therefore are most likely to get fixed the coefficient is both large in magnitude and statistically significant another key variable we look into is the number of scan types this is an
indicator ranging from one to four indicating whether uh indicating the number of times the number of different scan types an application has been scanned this include Dynamic analysis static analysis as CA and penetration testing here again the higher the number of different scan types the more likely the flow gets fixed in the second model we have modification in the code such that the outcome variable equals one if the flow is fixed or mitigated in some other ways or an equals zero if it's left unresolved here you can see the coefficient is very similar to the previous model except for this very high severity level and in the very high severity level the coefficient is larger
in magnitude and becomes statistically significant we interpret this result as the very high severity flow is more likely to be mitigated rather than fixed and this explains why the very high severity flows becomes large and significant only in the second model okay so next we ask the question how quickly this laws get fixed and we answer by looking at the time it takes to fix 50 of all Flaws by industry what is also known as the floss half-life and we start by calculating the lifetime also called the duration of losing days from the moment they are first found to the moment when they are resolved and we find the half-life by industry um we can see that the average time it
took for 50 of all flaws to be fixed across Industries was 121 days but you can also see that the inter-industry variation was also very large Agriculture and Mining which was the fastest industry had the half-life of just 29 days whereas consumer services took 341 days to fix half of all flaws found in the applications to to illustrate the SK to illustrate the scale of the difference between the slowest and fastest Industries we have also plotted the proportion of unresolved flaws over time and you can see for example that after 20 months in Agriculture and Mining almost no flow remains whereas even after 30 months in consumer services almost a quarter of laws remain and we will later show that
this is in part because of the different nature of laws that are faced by different Industries next we ask whether specific Industries get faster at fixing flows over time and to do this we compute the number of days that it takes to fix 50 of the floors arising in each year for those Industries and we do this for each industry and then on the entire sample and that is if I explain to you the columns here in the table if you look at 2019 what this says is that 2019 refers to the number of days that it took to fix 50 of the floors found in the year 2019 2020 refers to the number of days
that it took to fix 50 of the floors found in 2020 and so on if you take Turtle as an example that is all the industries pulled together and we could see that it took 94 days to fix 50 of the floors found in 2019 141 to fix 50 of the floors done in 2020 and 80 days to fix 50 of the floors found in 2021 overall the pattern here is not that easy to summarize it's quite it's quite heterogeneous some Industries get faster over time some get slower and some have jumps like the turtle but there is a few industries that caught our attention which I've just highlighted in green now if you look at
the bottom right corner you can see that real estate and Healthcare are two industries that actually get much faster at fixing flows over time um for example real estate goes from over 500 Days to fix 50 of the flaws to only 12 days in 2021 and some other Industries show a inverse pattern such as consumer services which actually gets so slower at fixing flows over time um we should keep in mind here that those are unconditional estimates which means that we're not controlling for anything and these are only suggestive um and if we wanted to learn more we would have to enter a parametric model which is a model that takes into account all those other factors at the same time
and controls for them which we will do over the next few slides huh [Music] yeah so we just saw that there is quite some variation in floor fixing times both over time and across Industries so the next step is to ask why is that and to do that we estimated a parametric survival model the outcome we want to explain with our model is the lifetime in days of each floor from the time it is first detected detected in a scan to the time it is resolved um and at first we estimate this model pooled with all the data over all over all three years and across all Industries and we estimate the influence of four
variables of interest on the time it takes to fix the floor first we look at whether that kind of floor is permitted to go unaddressed under that respective businesses security policy or whether there is a requirement to address that kind of law second we look at how severe the floor is third how critical the application is for that developer and lastly how old the application is and we also control for a couple of more variables first whether the business requested an expert consultation from Vero code about that application how long they've been using this scanning service thirds which programming language is primarily used in the application and finally whether the application was submitted only for say a static analysis
or for multiple types of scan and also software composition analysis and manual penetration testing for example so one way to compare the effect of these variables on floor fixing time is to look at Hazard ratios a relative measure of um of how quickly flaws are fixed so it has a ratio of 1 would mean that the variable we're looking at is not associated with the outcome a hazard ratio greater than one which means that flaws with that characteristic tend to be fixed faster than those without and it has a ratio of less than one would mean slower slower floor fixing so here we can see the results of that first model um the perhaps most clear pattern is
here actually whether a floor is allowed under the business's security policy or not we can see if you look at the very top bar in the figure that such flaws that aren't required to actually be addressed are fixed about uh one-third slower than than any other flaws that are required to be addressed with regard to severity we would have maybe expected to see some kind of clear pattern that potentially that very severe flaws would be addressed with the greatest urgency however the pattern is not actually as clear as one might expect what we can say is that very little severe flaws are fixed about 30 slower than the 20 slower than than those with medium severity but there is
no significant difference between the time it takes to fix medium high or very high severity flaws and applications age does not seem to have a significant impact on the time it takes to fix flaws in that application but business criticality is again a pattern that fits well with the intuition the more critical an application is for the business of the faster flaws in it get fixed we can see that very low criticality applications um are are resolved the flaws and those are resolved about 30 slower than and medium severity uh medium criticality applications [Music] yes and now that we've seen what a parametric model is and now that you understand a little bit more about
Hazard ratios which are the coefficients here again and we're going to do it we're going to say we're going to run the same model but for the floors arising in each year for only two industries the two industries that you see here are actually the two industries we identified earlier real estate and Healthcare we identified as two industries that got faster fixing flows over time and we're going to dive a little bit closer into what may drive this pattern of them getting faster over time and if you look closely here at the customer tenure it appears that customers that have been with very good for longer actually get uh with time get to fix flaws faster I'm gonna walk you
through how to interpret the coefficients here so the first coefficient in 2019 0.749 can be rounded up to 0.75 which means that those the floors found in 2019 are 25 less likely to be fixed um because of the tenure of the customer but if you move to 2021 that coefficient actually becomes above one which means now 1.1 means that there's a 10 more likely chance that the floors get fixed with customers of that customer tenure so with time Customs actually get faster at fixing floors if we now look at the floor at the severity of the floors um there's an interesting pattern here which is that in 2019 a relatively low severe flaws actually are fixed slower
relative to medium severity flaws um but these results become insignificant in 2020 and 2021 um which means that there's no effect because you can see that the the stars are gone from the coefficients and you see the inverse pattern for relatively High severe flows which are insignificant to affect floor fixing rates in 2019 but actually significant in 2020 and 2021 reducing the likelihood of the floor being fixed then um by as much as 60 percent if you now move on to healthcare we can actually see that customer tenure in green again um coincides with the early observations we made for Real Estate which is that with time customer tenure actually increases um the likelihood that floor is fixed
um as you can see we moved from a coefficient below one in 2019 took efficient above one in 2021 which can be attributed for 2021 as an increase in the likelihood of the floor being faced in that fixed in that year of 16 if we round it up and then let's now look at business criticality in the case of healthcare um what is interesting here is that flaws which are associated with um relatively High critical uh application for business are more likely to be fixed in all years and as a comparison we can compare very low and very high business criticality flaws and you can see that um very that flows are associated with very low business criticality have the
lowest um Hazard I have some of the lowest Hazard ratios reducing the floor the likelihood the flow is fixed in in 2020 by as much as 78 um and if you move to the very high business criticality Flows at the last line of that square Red Square you can see that um you have the opposite with a very high flaws associated with very high business criticality um are sped up by as much as 75 in 2019 and 100 in 2020. we can now conclude on everything we've seen today we've we've understood that scans have increased over time and particularly so in 2021 um we've observed some cyclicality around the December period we've also seen that some shocks in the system are
seen throughout the number of scans done such as the lock for Shell flow becoming public there's also a high variability of behavior across Industries whether that is in terms of scan frequency scan Cadence software security maturity and the time that it takes to fix uh 50 of the flaws which and those can range from 29 days to 341. and we've also seen with the last part of our analysis that some Industries get actually faster at fixing flows over time and some get slower such as consumer services and this rate of flow fixing is actually sped up by the by the fact that this floor is not allowed under the client's policy the business criticality of the
application the flow is in and most importantly the customer tenure with error code and thank you for having us and listening foreign [Applause] happy to answer any questions you may have at this point yes so can I ask a question yeah so um you have the shocks to assist they're a vlog 4J have you been able to look at any examples of say um releases in say Android version or iOS version or other operating system versions over that time for as far as I know we didn't uh consider those okay um just because we we had limited time and uh efforts it's another research question there you go yeah of course and let me pass this
around thank you hello uh you mentioned business criticality as like when the aspects that you were having into consideration how how do you measure business criticality for the I think so that's like in the data set that was given to us by varicose um they have um actually a very detailed explanation and code book for each variable and they have uh coded for a business mechanicality on like four points um which I imagine is they didn't say but I imagine that would be um as a feedback from the customer that would say what is considered for them an important uh app and what is not you know when you register on a website on like when they register their account I
imagine they have to say how important that app is to their business so so the sectors that you saw big changes in either positive or negative do you know from very code how many customers they had in that sector could it just be one or two people getting their act together yeah do you could you uh is it are you talking about the time they took to fix which I know you're referring to this one where you show in the different sectors and they were like one of them one of the sectors got much better one of the sectors got much worse in terms of being yeah so Healthcare and real estate were getting better and consumer
services was getting slower right so I think [Music] um from what I remember there's about four and five percent um so real estate and Healthcare represent about four and five percent of the of the customers of varicose each so in total about 10 so there's actually quite a lot of observations if you if you move to this table here you can see at the bottom of the table that you have the number of Innovations for each year so you have minimum 200 000 observations for that year so it would suggest that it's quite accurate this for fixing flows per floors and do we know the data set is it uh data that the floors have they
potentially been ignored and that's what's considered as fixed or is it actually fixed what does that data set include so we interpreted flaws that are deliberately on left unfixed um that where where the customer said we don't want to fix this as resolved but so in in the in the survival analysis we treated them as as resolved well for example um
so so we do have a lot we do have um we do know whether a floor was addressed or Not by so so the cost in in the case where the flow wasn't fixed in the code it might have been mitigated through other ways by the cost so the customer said they have taken appropriate measures so that it's not that bad so for example in this model we consider that separated separately where that floor was actually fixed in the code or mitigated or a customer could also register that floor as we just accept the risk of that flow and we don't do anything about it but that is a separate category from just left unresolved so basically that
information is not part of here or is that also included part of this whole data set s so in those that where the customer said we are not going to do anything about it those would be treated as resolved in our data set and those where we don't know anything would be treated as left unfixed foreign so I have a question regarding uh the particular customers of Vera codes and at potentially this information is missing from your data set but do you have an understanding of these applications were used with high volumes of users or if these were perhaps even like just minor applications that have slightly collected your data set there um it's a pivoted question I apologize
but look we don't have any data on what how their user who who they are used by just how critical they are for the business and and
well I don't think you have any more questions so if you'd just like to join me and thank you all again thank you very much for your thoughts on thank you [Applause]