
[Music] we're going to introduce two important and interesting security challenges in software development and how we can use static analysis tools like Coq to identify and address these issues before um they become bigger issues so my name is tone and I am a software engineer at the Microsoft security code analysis platform team and my my name is Asha I'm on the same team as Tong let's pause for a moment and imagine ourselves in year 2035 um what is a life you expect flying cars um humanoid robots well I don't know um but there's one thing for sure technologies will undoubtly um evolve at an even faster Pace than we expect however there's one imminent challenge that we need to prepare for
now Q day so the name sound cute but the con is quite a bit scary the Q Q stands um here stands for Quantum so everyone has heard of quantum before it's not just some uh vague sci-fi or spiritual concept it's real and it's coming so Q day refers to the moments when quantum computers become so powerful enough to uh break the cryptographic systems that's protecting our cyber world today many experts believe that qay could arrive as early as the next 10 years um in fact the National Security Agency has set a 2035 deadline for adopting postquantum uh cryptog cryptography pqc across all national security systems so why quantum computers are so powerful well traditional computers use
bits and these bits are binary meaning that they can be either zero or a one um think of flipping a coin So In classical Computing you flip the coin and it LS either hats or tails one outcome never both well in a very unlikely scenar um the coin might land on on its Edge and if that happens you should probably go buy a lottery ticket but even then um it's still neither hats nor tails in Quantum Computing however flipping a coin is not a or logic relation but rather a end relation so imagine the coin spinning in the air existing as both hats and Tails simultaneously this is a state called superposition which gives quantum computers a massive
advantage in processing speed because can handle multiple possibilities at the same time now the ability doesn't come easily so to maintain a cubit superposition stability quantum computers need incredibly controlled environments so devices like dilution refrigerators can keep quantum computers as temperatures um as low as 2 m so it's colder than the deep space it's the coldest place in the universe so this together gives quantum computers um the ability to crack encryption easily so the most popular cryptography used in secur um Communications nowadays is asymetric encryption it encod and decod messages using a pair of mathematically linked public key and private key so imagine Bob wants to send a message to Alice unlike in symmetric encryption
where both Bob and Alice share the same key in asymmetric encryption we use a pair of public and private keys so Bob will use Alice public key to enry the message before sending it to her so this encrypted Cypher text is sent to Alice over a public Channel when Alice receives it she used her paired um private key to decrypt the message the important Point here is that uh even if someone intercepts the cyer um cyppher text along the way they cannot read it without having Alice private key Bob knows Alice public key which anyone can know but only Alice has the corresponding um private key so not even Bob can decrypt the message as long as
the private key uh is kept confidential the data exchange is secured but is asymmetric encryption truly save forever I'm going to use RSA as an example to explain it's a very common fundamental and widely used asymmetric crypto system let's first um look at why it was considered secure and then see why it is going to be u in danger if Q day arrives so RSA is an example of trapo function which which means that it's easy to compute in One Direction but extremely difficult to reverse imagine multiplying um two prime numbers for example 53 and 61 it's easy to calculate the product um even manually which is 32 33 but if I give you a large enough
number and ask you to figure out um which two prime numbers were multiplied to get this result it will be extremely difficult what you see here on this slide is a product of two large prime numbers so factoring this 10 um 2 uh 24 um bits long number could take a thousand year uh even with the most powerful computers we have today this is the backbone of how asymmetric encryption like RSA Works RSA relies on the um on relatively simple mathematical Concepts prime numbers what we just talk about and modular arithmetic I'm not going to cover all the math here like how the equation is derived but it's pretty straightforward so if you're interested you can also probably deduce a formula
on your own what I want to highlight here are the two keys uh we just mentioned so the public key consist of two components the modulus n which is the pro U product of two large prime numbers p and Q that are kept uh kept secret and the public exponent e which ensures efficient encryption the private key um on the other hand is represented by the value D so in simple term it's calculated in such a way that it undu encryption d by E so the security of RSA relies on the difficulty of factoring the large number n to together the two prime numbers pnq without knowing pnq finding D which is the private key from
E the public key is computationally invasible so RSA is powerful and has been widely used to secure the world for many years but how do quantum computers threaten this and why is RSA no longer considered fully safe when Q day arrives this is where shes algorithm comes in I'm not a Quantum expert so again I won't cover all the math and physics details behind sh algorithm but I will try to explain it um briefly so sh's algorithm was developed by the American mathematician Peter Shore in 1994 it is a Quantum algorithm for finding the prime factors of an integer using Quantum foror transform qft um the qft is a key step where the algorithm finds
the period or the pattern in numbers quantum computers can use the characteristic of superp position we just talked about to explore many possibilities s simultaneously so they can find this periods or patterns really quick once the pattern is found um it's used to factor large numbers which breaks the RSA um entirely this ability to factor large number quickly is what makes short algorithm such a threat to Modern encryption um systems uh RSA encryption depends on the assumption that factoring large number is hard but short algorithm drastically reduce the time um that's needed to factor them using a quantum computer so then you may want to know what is the recommended key length for RSA in pqc
practice as you can tell already um RSA key length affects the level of of security in this diagram you can see that as the length of the RSA key increases the security level also increases n bit security for example 80 bits security means that the attacker would have to perform two to the nth which is two to the 80th um operations to break it therefore obviously longer keys will in general be safer um the recommended length however is a tricky question the standards are changing fast so I I want to be um cautious and I'm not saying anything for sure here just to give you some insight I'm showing you the cnsa guidelines the cnsa which is the commercial um National
Security algorithm Suite is a set of uh cryptography algorithms promoted by the National Security Agency and in cnsa 1.0 which is the current standard a minimum of 3072 bits um RSA key suggested however um in the proposed future standard cnsa 2.0 it removed RSA entirely and suggest that it should be deprecated at that time I've linked some helpful um resources at the end so check it out if you want to know uh more about the uh latest updates short algorithm presents a serious challenge to not only RSA encryption but also other encryption methods I'm quoting um Arthur Herman who is the director of the quantum aliens initiative that qay is likely to come sooner than even uh Quantum scientists
can predict and the time to get ready to protect our um vulnerable data and networks is now we need to prepare for this um Quantum safe future by considering postquantum cry cryptography on practices that can resist Quantum threats one of the key preparation methods uh we want to introduce here is through static code analysis the security ch um standards are changing fast we now have stricter rules for protecting data but a lot of the older Legacy code or even the code we um currently write daily hasn't been updated to meet these new standards yet it's very difficult to check manually where these vulnerabilities are in our source code so static um code analysis is important in software
engineering it is a method of analyzing computer programs without executing them it examines the source code to identify potential bugs and security vulnerabilities early in the development um process unlike Dynamic analysis which requires running the program static analysis evaluates the code structure syntax and logic in simpler terms it's like proof reading code before it's even running so that people can catch mistakes like Ed outdated encryption methods hardcoded um credentials or logic errors that could lead to bigger issues down the line making software more secure and reliable uh let's take a look at some popular static analysis tools you may have heard some of them before for example um rosling and preast are very lightweight tools that's fast and
doesn't require build to run but they only support C and C++ respectively and the scope is limited rosling can only catch three flow steps and preast is limited to function level scope I will talk more about um what flow steps mean later and why it is important in static analysis Sam grab is also lightweight and it supports most um coding languages and Frameworks including stamps however it only covers file level scope and has limited semantics sonar Cloud offers um great multilanguage support and cicd integration it's cloud based so it's easy to set up and integrate with various pipelines like GitHub or Azure however sonar cloud is a paid service and it's not that powerful on deeper
security analysis cql is a free semantic uh code analysis engine it's acquired by and now maintained by GitHub so Asha and I are not on the product team uh but we use and uh maintain coo a lot in our work for static analysis coo supports full and Thor oral um semantic analysis and can trace data flow across the global scope so it's great for identifying complex vulnerabilities like SQL injection cross- side scripting or buff overflow the downside of it is that the learning curve is steep so it can take some time before uh writing complex queries it can also be slow on large code bases if you compare the these options lightweight tools like rosling
or Sam grab are great for quick surface l um tracks they can also be easier when writing custom queries compared to co co on the other hand is an overall comprehensive option when it comes to um deep security analysis we will use Co today to illustrate like how it can help detect hidden security vulnerabilities like the bad usage of a small RSA key in code for example as you may um have guessed from the name c combines both code and query language at its core it treats code as data so we can write Cal queries just like SQL queries to analyze code for issues related to security uh correct correctness maintainability and readability this video shows how to
create a code ql database in vs code terminal codl CLI extracts relational data from the source code to create a cal database that has the information on how the code is structured it represents functions variables and data flows as quable data points so code plus ql lets you carry code as if it were data in the next video um you can see how this database is used when writing Cal queries here's where um the concept of EST or abstract syntax tree comes into play the EST showed in the r box here uh is a tree like represent representation of um The Source Code by treating the code this way C allows the users to search their code um at scale for
complex semantics patterns user can also customize the queries for specific patterns okay so back to RSA encryption I want to connect the dots here and show you a practical example of how coo can detect the usage of RSA in the codebase we have an um simplified example here in go language where a RSA private key is generated using the standard crypto RSA um package the user classifies the key length as 1024 which which was disallowed by the National Institute of Standards and technology in I um 10 years ago back in 2013 so it's definitely weak and we should avoid from using uh 1024 bits Keys um of course so now if the question is simply to detect
any RSA usage no matter where the key length U no matter um how the key is used we can write a super uh simple query in Co to detect that for example in coo for go the call to the r box RSA do generate key is represented as a selector expression and by having a query like this any line of code that costs the Issa package will be flaged you can also see here that c queries follows um the uh ql structure of having a from where select statement where the select statement include the final part um alerts you want to create for the findings such as um this usage is bad and please use stronger key length blah
blah now I know some of you might be thinking why not just do a control app to search for keywords like RSA why use coo well one of the reason is that um the example on the previous slide was a super simple code that H only has a few lines imagine um code bases that have millions line of code with really complex Imports naming and structures searching manually is a very timeconsuming effort a more important reason here is that coo can track the data flow from a source to a sync in the code you have heard me talking about um data flow several times so here I'm going to explain it a bit the actual thing we
want to catch here is not the keyword like RSA itself but how the data is flowed in a dangerous way a data flow usually consists of a source and a sync a source is where on trusted data enters the system such as variables user input etc for instance in our RSA example the source is a small key length uh 1024 the data flow ends in a sync uh which is where this untrusted or weak data is used in a potentially harmful way in this case the small key length is used in the RSA do generate key function as a parameter for generating a private key which should be flaged because we don't want to use this um small key length
Asha is going to talk about um tended tracking later which is a more complex data flow um SC scario but in this example it's pretty straightforward that an integer Source flows directly into a function call sync we can then write a more specific and complex query that utilize the data flow tracking feature in cql in this more complex query we Define the source as any integer that's less than 3072 bits long I chose 3072 because U that's the CNS a 1.0 suggested length but keep in mind that it's uh subject to change uh we then def find the sync as the second parameter in the RSA do generate key um function call so coo is then able
to track the flow from the green box which is the source to the red box which is the final sync of the source the syntax might seem weird and not very intuitive for example um we uh you have four columns in the select statement instead of just one but don't be scared um that's just how Cal is written for showing both the sync and the source in the final alerts this is this is what you will get by writing uh running this query in vs code it will show you um the the bad source and the potential syncs of the source and also the location info of these information in the file you can see that the blue um 1024 here is a
clickable elements so clicking on it will take you directly to the line of code where the RSA RSA key is defined which is the source this is also an illustration of why coo is powerful because you can have the Primitive values as an as node so that you can trace down the entire data flow for more information on how to start your um first query we have included a resource section with helpful Links at the end of the presentation for you to refer to so I've only scratched the service on uh pqc today my team at Microsoft and some folks from GitHub and cender Bank have worked on a exciting project that use coo to analyze the security status
of the top repos in GitHub to facilitate pqc adoption you can learn more about this um in the blog post linked here or by watching the black hat presentation recording about this project well to conclude my part I guess um the takeaway here is that we need to start preparing for postquantum cryptography but don't be panick I don't want to deliver um the wrong message uh regarding crypto so I'm being super cautious I know um the nist has proposed a uh pretty ambitious uh transition timeline before 33 and that the final fips validated um standards for the algorithms are out already however some of the work for how to adopt them for example combining traditional methods like RSA with new
methods like module lat lce based um me mechanism is still in draft form while RSA is not broken yet we have to assume a cryptographic relevant quantum computer May soon exist and that we should take the appropriate steps to protect ourselves in the interim nist suggests that cnsa 1.0 compliance I showed previously should continue to be required so let's start by practicing static analysis to check for the weak uh crypto usage in our code while pqc um raises really big security questions CIO is also a great tool for detecting more immediate uh vulnerabilities often um seemingly trivial yet very crucial ones next I will hand it over to Asha who will walk you through an example of um
another interesting topic the leap here crisis since this year 2024 is Leap here she will also introduce other fascinating um aspects of coo such as T tracking or merva that will give you more abilities and fle flexibility flexibilities to WR your own um customized queries all right so to continue on our presentation I wanted to talk about a technical issue that's been affecting software systems worldwide what's commonly referred to as the leape bug um this may sound pretty small or obscure but this bug has caused significant problems to systems ranging from personal devices to corporate servers so what exactly is the leap year bug and why does it matter to understand the leap year bug
we should probably review the concept of a leap year as you probably know most years have 365 days however the actual time it takes for the earth to orbit the Sun is roughly 365.24 days which is known as a tropical year to account for this extra fraction of a day we add one day February 29th every four years but it's a bit more complex than that a year is a leap year if it's divisible by four but not if it's divisible by 100 unless it's also divisible by 400 for example the year 2000 was a leap year but the year 1900 was not now what does this have to do with bugs in software many systems and
applications are designed to handle dates and times but not all developers account for the rules of leap years correctly when software is written without properly accounting for these it can result in what we call the leap year bug this bug occurs when systems fail to recognize February 29th add or subtract days incorrectly or mishandle dates around those leap years the most famous case of a leap year bug occurred in 2000 the First Leap Year of the modern digital age some systems treated 2,000 as a regular year because they followed the rule that said a year is not a leap year if it's divisible by a 100 they completely forgot the exception that it could be
because it's divisible by 400 this oversight caused applications to crash other others to display the wrong date and others to completely malfunction issues arose in Mainframe systems Financial software systems and embed systems like medical technology for instance some of the older Mainframe systems and financial software systems incorrectly uh calculated dates leading to ranous billing Cycles miscalculated interest rates and scheduling issues though the largest disruption occurred in February of 2000 there are major cases that occur every Leap Year I've attached four examples of these disruptions here but we we'll be focusing on one of the largest disruptions in recent history in 2012 Microsoft azure's cloud platform experienced a leap or bug Azure is one of the world's leading cloud services
and its outage had a global impact the problem occurred because a portion of the Azure infrastructure didn't account for the extra day in the leap year when the system tried to calculate certificates based on date and time it failed causing a massive service disruption as a result businesses relying on Azure for hosting data Ser storage and other services faced unexpected downtime the outage lasted for over 12 hours leading to significant operational and financial losses for companies around the world while that sh slide showed some of the major well-known leape bugs from the past few years there are security concerns that arise from these all the time um leape bugs can cause systems to mishandle date related data leading to
improper storage or deletion sensitive data being lost or leaked D due to an incorrect date calculation can lead to breaches that expose confidential information to unauthorized parties software improperly allocating memory when handling dates in a leap year can lead to memory leaks where resources are consumed but not released degrading system performance and creating openings for attackers to exploit the system some security systems rely on time-based authentication tokens the way that Azure did um and those expire after a certain period leap your bugs disrupt the handling of these timestamps which might cause some users to exper exp authentication failures and other unauthorized users to gain access to systems overall leape bugs aren't just a nuisance they POS serious risks to
security data and operations properly addressing these vulnerabilities is essential to ensuring that digital infrastructure remains secure and reliable so how do we go about preventing these issues for developers the answer lies in testing and planning ahead modern software relies heavily on automated testing and any system that handles time scheduling should account for leap years this might sound simple but the sheer complexity of date calculations means that it's an extremely easy mistake for developers to be making over time many date and time libraries have been improved to handle these cases but bugs still slip through therefore we recommend the use of static code analysis as Tong mentioned that's the process of examining source code without executing it to identify
potential issues specifically through code ql teams can improve code Quality Security and maintainability before software is run in this section I'll be going over two of the most common leap year bugs that we had found some examples of them and work through an example of writing a code ql query to find all common versions of this flaw um as Tong mentioned and this is just a full disclosure thing Microsoft which is our company through GitHub did acquire code ql a few years ago we're not on their product team we don't do anything in terms of sales or anything we're just speaking to this because it's the tool that we use internally and is the static
code analysis tool that we're most comfortable with I my aim in this part of the presentation is to talk about the capabilities that you would want in a tool and um the things that many static analysis tools can do as well um before we get too deep into it we also should understand what an anti- pattern is an anti- pattern is a common but counterproductive solution to a reoccurring problem in software design or development it often seems like a good idea at first but over time can lead to negative consequences like poor performance increased complexity or maintainability issues recognizing and avoiding anti patterns is important for creating more efficient scalable and sustainable systems currently the main
usage of code ql is to find anti patterns to flag all of the queries that we write internally and the queries that are required by our company are um based on anti patterns that have been found by the public for a leap year we had nine anti patterns that we were working to solve before February of 2024 these revolved around the addition ition of unsafe values combination of date values that could be harmful storage of values and more for this example we're focusing on two of the existing leap year anti patterns that are similar enough to kind of understand together while there are other leap year anti patterns we could have focused on these are just the
easiest to jump into and understand anti Pattern 2 typically occurs when a developer wants to add or subtract some number of years typically one to a date but does so doing using a multiple of 365 days with the introduction of a leap year these could cause those dates to be off by one leaving room for any of the security flaws that I was going over before adding 365 days to February 29th 2024 leads to unknown Behavior where the date doesn't either where the date could either normalize to March 1st 2025 or back to February 28th 2025 the leap year doesn't necessarily need to be involved though subtracting 365 days from July 1st 2019 gives July 1st 2018 but
subtracting 3 65 days from July 1st 2020 would give July 2nd 2019 excluding the other July 1st anti pattern 3 occurs in a similar manner when a developer wants to add or track some number of months but does so using 28 30 or 31 days not can by not considering a leap year they may be adding 28 days when in reality they mean to add 29 similar examples adding 28 days to February 1st 2019 does give the correct date of March 1st 2019 but adding 28 days to February 1st 2020 gives February 29th 2020 instead of the March 1st date that was realistically expected uh this is the most common response however different packages in
different date languages have different behaviors for overflow of days in a month some may throw an exception While others wrap to the next day but since we don't always know the answer that's why it is seen as an anti pattern um here are the two most basic examples of these anti patterns in practice for JavaScript for both a new date object is created and then a number of days is added as a constant using the set date function each of these would have be flagged by code ql for their use of 365 and 28 respectively the correct solution in this case would be to use a proper datetime Library rather than just doing um simple constant addition like
this um now for the actual writing of a code ql query we'll be focusing on anti- Pattern 2 um which as said on the previous slide looks at the addition and subtraction of 300 65 days on the right is a sample file that shows some examples of the anti pattern in practice the first one shows the creation of the date followed by an addition of 365 days to that date using the set date function the second one does the same thing but does so putting 365 in a variable just to show how the data flow or Tain tracking could work in the scenario and the third adds 365 days to a new date object by creating the date object from
a calculated time stamp that one is spefic specifically there to show why a simple contr F wouldn't work um random other ways of creating the same problem can always come up and using static code analysis is really helpful to just grab all of those based on these samples we have three alerts we would like code ql to find does anyone have any ideas what lines they would be on if you can read the line numbers from where you are um if you did could guess and see it from where you're sitting um lines 3 8 and 12 should be the ones that are alerted with the entire statement with the addition being highlighted based on
this we can begin the process of creating a query um as Tong mentioned before I wanted to be touching on taint tracking this is something that is absolutely necessary for this query um Tong explained data flow before how it um traces the flow of information from one part of a code to another code ql taint tracking on the other hand focuses on tracking the flow of potentially dangerous or untrusted data known as taint within a program it identifies the sources of untrusted input and then follows that data to determine if it reaches vulnerable parts of the system without proper validation or sanitization Tain tracking is different than normal data flow in that normal data flow libraries are used to analyze
information flow in which data values are preserved at each step while Tain tracking uses it to analyze information flow where data values could have tainted a different variable for example if you're tracking an insecure object X and the statement Y = X + 1 pops up a normal data flow analysis will highlight the use of X but not y however since Y is derived from X it is influenced by that tainted information and therefore it is also tainted in this specific query example we need to be using taint tracking since we're following the path of an addition or subtraction of a value um to the use of it in a date function code ql makes taint tracking extremely
easy so all we have to do is identify the sources of the taint and the syns and code ql will find the paths between them so since we're looking for the sources in syncs we're going to be starting with the sources to find the sources we need to find the specific pattern that leads to the problem in this case it's anything that involves the addition of 365 to another value we look for this because this pattern doesn't necessarily mean it's the full problem we're looking for but it's necessary for that full anti pattern to be found highlighted on the screen are the three examples of this from the sample code we're looking at and if you
see the actual like date object plus 365 doesn't actually prove that there's a problem until it's actually used in this sync um it needs to flow into a sync that makes up the rest of the anti pattern um in that case it would be the use of the set date or the new date function and Constructor respectively that end up flagging the code these are the same highlighted portions as we showed earlier on lines 3 8 and 12 because these are the final portions of the code that should be alerted now that we know that it's finally time to actually start writing that query so to start we first have to do the Imports we Import in JavaScript as that's the
language that is currently being scanned each language has slightly different styles of query authoring so that's a necessary line the following two Imports are data flow and data flow path graph data flow allows for use of the data flow in Tain tracking modules and path graph enables path exploration and visualization of data flow through the code enabling path graph allows for the visualization from sources of s taint to sync in an interactive manner finally we import classify files to remove um to allow for the removal of autobuild and autogenerated files later on in the code so now looking at the code we can begin to build the parts necessary to find the sources to start each of these
classes has the statement extends something so in this case it says class unsafe year arithmetic nums extends expression um code has built-in classes with methods that can be used we usually end up extending the class that's most similar to the object we're trying to find um resources that explain this more in depth are attached to the end of the presentation um first we want to find all the places that there is a number 365 this is the first step in building the source as there has to be addition involving the value 365 to do this we create a code ql class that finds either the number 365 or 525,600 th000 which is that time step
value measured out and then there's also the case that someone when they're trying to figure out dates does 365 * 2 or 365 * three instead of just like adding years they just multiply out for the number of days so the third statement is finding any multiplication that does that and it's a little bit of a complex looking thing but it basically says that given a multiplication expression if one of the operands in that expression is the integer value 365 it counts um when we run an evaluation of this class the three Orange boxes that are highlighted on the left would be found um using the unsafe year arithmetics nums class we can find any
of the addition or subtraction that may be happening in the code the next class on arithmetic year Ops Begins by finding an addition or subtraction sign like the plus or minus then it looks for two possible options the first is the most basic option where there is a addition or subtraction thing that's happening and one of the operands of that is an unsafe year arithmetics um number the second um case is that the unsafe year arithmetic number is stored in a variable in that case it looks for the addition or subtraction expression finds that one of the operant is a variable and then finds that that variable has one of those unsafe numbers stored in it
using this um arithmetic year Ops class we can find any of the possible sources of this sample code as seen with those blue highlighted boxes and you can see how the orange kind of goes goes into that blue now that we have those sources figured out we can move over to the syncs the first sync possible is the set date class to find this using Code ql there's a simple class that can be made that looks for any Coles that are specifically called set date running this code ql class we'll find those two purple highlighted boxes um this one is a little bit more complicated but still the exact same idea in the case needed for this anti-ad
P the number 365 must be affecting the year value of that new date instantiation however when using a new date Constructor the year month or day can all be set so our code needs to be able to specifically grab that year argument as a sync to do this we can make a class that has a characteristic statement that makes sure that it's a date which we do with the get collie name equals date and then have multiple other um methods within it that can identify the year month day arguments since each of those are always set at a very specific location 0 1 or two respectively the code um can look for the value at those specific arguments um
running this date instantiation core class finds the three pink boxes highlighting the new date and then running gete ARG specifically finds that red highlighted box are there any questions before we keep going on the source and the sync stuff so it can um also catch the correct implement which is also always one of the problems of static code analysis what we try to do is we typically look for those false positives so that way we can specifically gear them out in this case the correct way of doing it would not be to use like a set date like that with an addition it would be to use a set year function which does exist in the same
class so that correct version of it that's found with the anti pattern and in the in pattern explanations online um aren't going to be flagged because we don't look for set date specific or S Year sorry the most of the library packages that you can use for this stuff does actually already take care of that the problem is usually when developers are doing it more manually because that's just the Habit that we all have continuing on um now that we've had those we can add them to the taint tracking configuration um this module exists in code ql to use the taint tracking capability so we start out by making the characteristic statement of your data flow that just explains what
this module is doing and then in this case it says finds where 365 is used in a year addition or subtraction then we have to override the is source and isync statements and each of the source and sync have to be written out in this case the source is arithmetic year Ops as we found out earlier so line 70 says that if an arithmetic year Ops object exists it could be a source so save it as a starting node um in this case it's once again all of those purple boxes um the next two lines remove all of the library and autobuild files that typically come along with making a um JavaScript project so that way we don't get an
influx of results that can't that we can't control um within the sync predicate we know that it either has to be a set date object or a date instantiation core object so what this is saying basically is that if a set date exists that could be a sync so just save it as an endpoint or if a date instantiation core object exists once again could be a sync so save it as an endpoint behind the scenes when we run this code ql will look for any paths between those starting nodes and those ending nodes and if one exists that means there is a path from a bad start to a bad finish and we have possibly
found an example of a true positive for the addition or subtraction of 365 days um now that all of that is done we can finally write the select statement and that's the true part of a code ql query it includes a from where and select statement the from statement chooses the variables that will be used in the query the we statement defines the bounds of what would be chosen and the select statement is the output of the overall block of code here we say that given a source in a sync where there is a year data flow with a path from that source to that sync um we can select those values with a message
explaining the problem this is similar to an SQL select statement though some of the syntax is just weird and it's something you end up having to get used to and you look at the references a lot um while I'm not running this live it's really similar to how Tong's example from earlier was run the alerts that are found are underlined in yellow on the left side of the screen and that's what the code shows and then below the actual query statement is what the alerts look like in terms of when you run them the messages that you get uh with the same blue lines that if you click on them they would bring you to the correct
place in the code um now at this point we have a properly flowing query that will summarize the anti pattern shown so this example only shows this working on a 20 line file how do we actually use this in practice ice we use multi- repository variant analysis also known as murva this is a powerful tool for examining code over different repositories merva ensures consistency and highlights the impact of changes on each of the code bases here merva allows for code ql to be run on up to a thousand repositories on GitHub from vs code at a time leaper queries are really any queries that find problems um can be tested on large amounts of data to find
a majority of instances of a security flaw as um a consumer or an external person if you would want to use a similar process with merva GitHub has a code ql database for every repository that is enabled code scanning this means that your queries can be run on any of your own GitHub code bases and you can make them yourself and run them personally and with that we've reached the end code ql and really static analysis as a whole are very useful tools for security and finding bugs before major events occur having something like this that allows for large code-based scanning and finding every in or nearly every instance of a problem that can then be sorted and
fixed is a tool that shouldn't be taken lightly we've attached extra resources that can connect you to um more about pqc and code ql as a whole please let us know if you have any questions we also really quickly wanted to thank our team back at Microsoft for their help in finalizing this talk and all of their suggestions along the way and here are all those resources are there any questions so that's the nice thing with using like the taint tracking or the data flow things it does show you every single step of that process with and it would unfortunately it can't be run live because it one of the problems with um data flow and Tain tracking and
something that Tong brought up earlier it is extremely slow um but when you were to run a data flow it would actually highlight every single line that one of those happens on so the way that this is showing up with those squigglies under the Year bad. getet date is that it highlights the argument of the Final Call that is the problem but with the data flow it would show you where the 365 is where the addition is where the addition moves to throughout the process and then finally where it ends up and then the other issue is like stuff like with timestamps or if they're using like bigger amounts of days those were always the problems that we had
with it but I actually have not thought about gred before no yeah um honestly I don't think I will say it was not as hard to understand as it was to explain um but um I don't actually know how close Cod pilot is we um just Microsoft as a whole has been working more to get co-pilot more integrated into things but especially with security it's always like a little bit of an iier statement um so that's actually more on the product side of things which is the GitHub team which we don't work with but I believe there is an effort as far as I know but the bigger issue is more just like object oriented code versus
scripting languages have such different ways of writing languages that it's really hard for an AB um like an a tree to actually be able to like manipulate them the same way which is why all of those languages end up being mixed up a little bit yes um I don't believe it's easy however I have met people who have done it um do you know the answer to that one like extending it to an to a yeah different internal language I was incorrect it only supports those several incorrect or those several big languages yep yep so that's exactly what we do typically the life cycle process of this is running it like this so actually the way that I was running it
when I was creating this presentation was all locally on a sample test that I had made to find false positives true positives whatever and then we run it in a like a little bit of an environment to see how many false positives we find when we run it on a larger amount of code and it's just a cycle until we get the false positives down to like less than I think like 5% okay thank you so much [Music]
[Music]