
thanks for coming my name is Hashem and I want to talk about programming tool like bindi which is there more novel tool similar to this one that I Greta's miss out so give me the name of that of the source project is the authority afuera the world comes from Greek in Greek it means different it is open source CBL is a leader plugins at least for now maybe tomorrow it will became an independent tool but for now it is specifically written for data initiative for performing our program living which is typically referred as binary different it can be used for example or Patsy detail finding vulnerabilities well fix it in a closest source product it can also be used for example for
porting symbols between either it erases or finally new functionality that was added in a closet or problems it can also be used for example and needs one popular views of the tools for detecting placket if some company you said a short code of soul open source project with provide clothes you need or some company just stolen the source code from another company and integrated we can use in binary different program the FinTech if it is actually the same or if they start and become out of the same code hippie be first of all I think that I have to splain why I wrote one more program diff into a there were many reasons to decide
to write my own one he has one more programming tool because this is not the first one there are so many the mental I used to use was dynamic speeded it was unfortunately not updated lately since I don't know maybe four or five years and it lacks many many features that high requires and that I didn't understand why it did ask them maybe because they don't worry about such features but I really needed that for me and so I was before starting writing from scratch my own tool I was taking a look to the author shores approach that are all death but either they were not as good as I expect or I considers too hard to adapt seuss
project to my needs so finally and because I mostly for her to grab my own tools i discarded at all of them I'm stepped right in that my own one from scratch kidney more about the why I typically work during University gathering and many times I need to port my work from an old version of my target to a new person for example if you are researching so on a specific component of internet explorer and you have all your comments this trip to sea critters I don't know anything and then a new version of interpreted for it appears you want all your comments you want or do this through towards you or your invitations and cedar citra should be
easily portable between the old version and the new version there are some things that can be important for example dynamics finding that commercial tool but I have heavy user of suitors raytheon's unions and all trying to use about the decompiler because on so you apply structures Marie Jones to neons CTR kshatriya you can't have a very nice idea about real I'm looked at phantom that you are researching as you are you see then it is easier than look into the disassembly but dynamics we need the best stood in the back for programs even visit my aluminum desert support native structure snortin variations anything else so important Simmons from one database to an older iPhones have so
many many many cases were phantoms very similar in the silicates in the de compilat code and very different in the assembly word mrs. bass on an expanded if you press a 5 or butter and you see that different functions at assembly level actually translates same C code you know that is the same phantom but mainly for all the tools didn't use and didn't apply any huge dick using the silicone I wanted to use it because I know it will be better for having better matters also in this house final between the user to work really bad when you was when you were matching free sample one program or a RN + 1 / run for a and v60 for not
comparing the same architecture but different well in the past before writing the project therefore I used to write many projects Pacific Heights google Python scripts to port my instructors who ugly than the Pune on the state etc for each project and they like pirates and I decided that it was better to write a generic tube better than public hard script for every new bro yet the first 100 world of the Opera is programmed if until it took me like a week and more bless and at least in my test user to work better than dynamic believe to believe darling green and Allah tools okay this whole how about how it works internally apparatus the follow me simply well if export is a
database that we are going to compare to my own SQLite format then I compared simply running SQL queries on the artifacts all the attributes from both databases and I shall de mattos Anaya as simply as it sounds however naturally it is more complex than that okay hold worse when sporting it explores the following from its database the following events like phantoms all their latest attributes to the fantômes like flags the prototype relative Alice's the number of basic blocks a number of ads the cyclomatic complexity also all the related information for the basic laws of the relay this information for the instructions the sealed calls alien pilots close of each phantom also the abstract syntax tree the ASD of each
fancy on the field goals the anther syntax tree is a custom tool that explains called the shield cold cold cold from the language Jesus press ok for the phantom or basic block and social information is calculated I prefer to calculate such things at s4 time because us for owns and then you give a lot of times some of the things that they don't blame her for example I has buses on the bikes this is the non change in bytes of its instruction if it is a for example movie X some offset i discard the offset and I only take the bytes that will not change because of renegade yawns over because of the displacement i also have played the
cyclomatic complexity of its phantom they strongly connected components of the flow graph of it phantom the topological sort of the claw grabbed the small branch proud I will spend it a bit later for it in strongly connected component a set of 3d parking hassles of the world if she would call text an answer apathy has bases on the HD they abstract syntax tree of the suit coat well it is shun again and always a small price product that I could explain that wrong also i calculate the number loops and i strive all the sweeter structures like the all the bandwidth that our users in this way it's all the switch pieces okay also at the database level
not at fungal level I am select for the following a Fafi cobra cast biases on the small price proud of its functions cyclomatic country that's it for the cyclomatic complexity of its phantom i assigned to the corresponding diplomatic complexity a prime number then i multiply all the prime numbers of all phantoms and i use it as a hot this is the party Cobra has to take admit I also saw all the price that I used for the calculation so I won't have to reuse and again and answer the small prime crew has of the Cobra will be used to determine for related to binary star it can tell you if the Cobra of the program
is equal so if it is equal it is supposed to be struck totally the same program or if there are very little changes like the cover up the incident 0.1 percent you know that it is like one funds that it that were change okay comparison when comparing to either databases and as for now the authority can use can compare two databases it cannot compare for example while actually or something like that the two databases that are exported to the sequel is for Maya to the sequel I'd format understood but at the Opera simply a comparative using SQL carries with all the attributes that were distracted before the SQL here is lunch at try to match smash has movement as
possible using many many different heuristics first the most robust one's the most the heuristics that are likely going to cause the p-word number of false positives orleans and then the less robust ones the ones that are going to cause a lot of false positives then for this match a similarity ratio is calculated on a cyclist and finally the results are in lovelace as Robert as a the best partying or unreliable okay as of today the opera has 44 heuristics implemented for finding matches between two different data libraries sauna trembles are for example equal or similar both assembly or suit coat it means if they assembly in the two databases is the same we have a mat if the assembly is
different but the silver gold is the same we have a match and also I do if we grow we throw I mean the assembly as you get it from Ida or the silikal you get it from Ida and cleaning it up for example if you have those precedents like I don't know offset all with a underscore finales I simply remove all of this part and then compared with a clean it up version somewhere holistics are the bud has named if the has i calculated with the nonsense in bytes of its function and also the true names represented from what fans in charter saying we have IQs matt give me the same times your name if you are reversing
generating example passes from microsoft we have a nonpartisan mode so we can simply convert the function names and that's all we have a mass all there's all or most attributes from the functions like if the number of basic blocks and the number of deaths and synchronize the complexity and the number of important function is actually the same we can conclude that it is actually the same function so we have at this match all words can be an alert body as you know this signal is not eight and brians price is a primus i connect to eat mnemonic to eat instructed to each assembly instruction and then all prices are multiplied together and if the final has to the
same the bryant we wanted it that it is the same function but that may be so instructions were pre orleans and follow or heuristics I will explain more in more details all of them natural okay so for this mat that we have from the SQL kishka to run a similarity ratio is calculated using three methods the first one is the easiest one I simply use the sequence matter which rate your phantom or member phantom from bygone comparing either the assembly or the silicon it basically gives out asking relative ratio between to the strings so using a clean sock version of assembly and the result we have a ridiculous way of comparing for related 100 ms 211 when we
have much I do it both the raw version of the assembly on the silicon and also with a cleaned up version of assembly and silicones also i calculate the different ratio basis on the small price product of the abstract syntax tree it is for this expression or for each instruction element in the asteroids index 3 I assigned to it a prime number than 85 of them and if the house is the same and then in structurally the ASD is different the tree looks different we can conclude that it seems to be the same function but maybe with role that is instructors that's if what i will do in this case in the case of the asda
small brains problem basis has is i remove all the primes in both sets and i only consider the price that are in one of the two sets that were not in both sets i calculate the signal is rated based on that and that's what okay then according to the radio the matches are rooted in a set of a the best partying or unreliable results the best results are death with a similarity ratio of one point shoe so for example if you are looking for how one man ability was fixin for patent you simply ignore everything in the best matches wyndham partner methods are desk with the similarity ratio bigger or equal to 0.5 in general so nihilistic are
considered reliable I mean when the ratio is lower than 0.5 for example oh the same Nine heuristics compares phantoms that have the same very same name no either generated names lack like a zoo underscores from at it only two innings like I don't know create file or something like that for example when comparing Microsoft passes it helps a lot and even if one country and was heavily modified us when the similarity rate is like 0.2 or something like that we know that still it is a good match because we have artists image answer the last results are they reliable ones and there's our deaths with less than 0.5 similarity radio or the you district at
home demand for matters is known to cause too many false positives for example I use one a dualistic that simply compares the number of loops in a phantom if a phantom has chain five loops it is a very small number of loops and it can cause a lot of false positives on the other hand if we have like 100 loops in a phantom the same we know that it is going to be a good match but it's not going to happen very often so this one is considered is overall an unreliable spreading mental dualistic and by the pole of racial matters or forms by DS heuristic and automatically lot of us are reliable okay sanyal
istics there are astray the same thing you see that I used in the afuera as of today 44 so then are rather simple for example the same name heuristic another bowlers are really complex son of there are very reliable like the same number the same name and all those are rather unreliable like the loop number let's see some of them hey this is one that we already discussed a lot of it sorry did you receive bite has our names so it simply compares the non-drinking bites of each instructor and the referential through next if the bites from instructions from the same instructions are actually the same and the two innings of I don't know constant users
of global variable names are the same we know that we have a really really used math a true name is a name like a punter man or a student prepared with a name that is not alter generated by ina so it is not a sob underscore and address or off underscore and address or dbl underscore analysis this is one of the simplest and best you listings that is already implemented in here for some more Oh first this is the world code that you distich remember that i calculate everything that i need others for time so as for time is when most of the world is not and convert some time i simply run SQL queries this is all the
UTC in this case it's simply come back from all databases if the names are the same and the vice has is actually the same old interfaces okay this is one of the most Bonin heuristics in program giving tools susanna listing is available in dynamics been lived Darin green to your body in any other soul any other program digital india ferrah i had many many many youth sticks that are actually a specific to this tool southern are fine the same cleaner assembly or suitable because is the only two that is using the silicon as of today for finding matches the silver told fuzzy abstract syntax tree has a privately face as it is the only to using the seal the call to the
compiler it is a specific to the era the strongly connected components small brains product although it is faces on the graph and most of the program includes half a dualistic basis on the graph as far as i know is the only tool using that and for example switch structures okay how it works is using the same clear hub assembly or silicon it is partly based on the pseudo code generated by the x-rays to compiler actually it generates a clean textual representation of the assembly through the cold or both that can be used for comparing for example if you happen starting line movie ax at finale it will simply remove the part the part that is
auto-generated by ida and it would be like move the ax xxx if we have in the silicones into the compiler in v1 equal to b2 plus whatever if you look the move go to generate earnings for the variables and it will be like int X equal to X plus the constant this is a very simple heuristics that works pretty well overall and councils are very low great deal of false positives this is how internally the holistic works this is a cleaned up version of they are simply for both databases and you can see there is no level there is no idea how to generate this name so in this example this is the Alice program they
are some dividers for denotes we have the a good match because you know the angular work is the same functional differences and the only thing that is changing is actually this one there is neither either how to generate a time and in the other database would have a proper name indeed errors I only remove either generated neighs yep untie don't remove the number of the names that are actually use names some people told me that this maybe is an error because he sold also ignore all names for example in this from parties on the singularity ratio that is generated is like several points 9 930 like that instead of 1.0 if I remove that name also the enzyme for the Indian
arrows the signal integrated that it would generate will be 1.0 however I prefer not to do so because in this case it is clear that the same phantom but in other cases let's say that it is a fun team called but I removed all right nor the main and we are matching for example of intent that is adding an elementary with a fun team that is moving elements frenemies this is one nationwide I don't do that eating writing you districts also for programming tools is hardened what it looked before you actually start biting statistics because when you start doing Mia time trying to compare to match anything and now heuristic written with note not very carefully is going to
kyle's a cotillion false positives and you don't want to have false positives ok one more realistic syllables party abstract syntax tree basil has it is very sad naturally on the extra AC compiler for now tomorrow i plan to use helps of this homeland compiler which is open source it takes inspiration the strategy in the a-si and assigns a prime number to it for example the eve instruction is still the two instruction is five the switch instruction is i don't know 13 in him then all the crimes that are cycling are multiplied together and fatty alpha syntax three times the generators for now only perfect masses do for with the finely cut are equal or consider in
the future I will also consider partying last we're not all the price the same but say I don't know the ninety percent of them are actually the same for long finger condoms I mean for big fans of panty ensuite a good number of destructive it really elated for defines good matches more they strongly connected components small price product using small brains product again as with the previous you distich it assigned surprising for its strongly component connected component basis on the number of strongly connected components of each phantom the final product of the calculator is part of the calculators primes the half I'm happy for all the perfect matches are considered in the future partner hassles will be also
considered but not for now okay switched to touch it considers the total number of cases as well as the act one bad with of the switches papers at I've used to match phantoms the doctrines which value is that can cause a lot of collisions like you can have sweet whatever case one case to case three default which are pretty common but all the cases have not that common like so it whatever k is 0 FF FF 1 FF 0 2 or cases that are uncommon in such cases this heuristic also actually finds very very good matches that all of those means overall so before implementing it I thought that that heuristic world cows too much false
positives and will be very much reliable but the reality provides all rights because it works better than all the beauty sticks that I shall at first that vocals little or no false positives at all it is very different what you have on mines before grinding shuttle to what you really find when you start right in such a tool okay time for a demo I will in fact some very very very very small samples then an example where I will simply port symbols from one database to the older and also how to find new functionality a phone can you with it yep so this is an orrible we can rely on it too ok so it takes
whatever it is given the program hope you see it into the sac base pattern and then printed it out ok what's this is a second version with at least one of the bags fix in this case instead of copying anything that is fast to that fancy on it is only copy in the excess amount of ice that the stack passes variable Islam and then anyway it printed child again and the final version where it is a dozen haha pain doesn't have anymore the form of the stream Penelope ok so i compile it both programs sorry to see bromance with a compiler for x in the late 80s for amd64 and I didn't have for a RM in this computer but anyway I will
have done the same with a mike's compiler or whatever and so a company for windows and for language i will start with the very first grown-up this one okay this is the very first proton here we have my phantom TAC fold the full fountain that close promise this will all be right there what I will do now is I've been simply export this program to see that theta is to my own format this is the screen that I have to run the offer up it is completely written in Python so you can simply run it from here like this ok this is the small window of the affero attacks for the path to store the data is that we are
going to generate 20 sport disc idea database we can simply press ok to the default hotels it asks if we want to rewrite it you say well it has it generators it starts sporty and everything being called won't ok everything is for these databases rather small and now i'm going to open the signals binary now that's what ok this is the second version of the program where the stream copy back it was fixed again i will run the DIA for a script but this time i'm going to use one more thing first yes i want to ask for the current database but i also want to give against the previous database which was this one's good night this okay yes
because i want to write a play before and it starts combining its first escort ad book it says the cold wraps idea to do you remember the small price for all the choices that i calculate from the cobra the code runs high total for both databases deal equal the program seems to be actually structurally applaud with actually are a program at until is the same program but at instructor level it is if we go to the best mattis window I can flush it we will have all the best matches matches with a similarity ratio of 1.0 and the description the dualistic that actually forms such matters okay if we press right click we can give assembly this is
actually missing phantoms of you have no chance we can give for example the silver code we are going to delete the carillon we can give in a graph instead of watching only the pure assembly instructions we can watch grab the visa clocks let's go to the party on martha's window this time here we have shown that matters which are in red with a similarity radial sharepoint syrup because they're both names look very similar but are actually not the same function we can ignore them and then we have a very very cute match with a similarity great view of syrup on 89 it says that the silicon is very very very similar if we right click on me the
silver code we will see that the only difference is that it is calling now Esther and copy instead of string coming so hold the developers fix at that vulnerability it is very clear in the city let's do the same again back with the other data is this time it will open the 3gs one is once I will run the third one again instead seconds one Hey yes we want it right I need to start finding matters well first it starts 14 the database and again it says the same the cobra or what programs is actually the same and then we have to go to the partial matches where we only have one single part and
matts because all the other mats are this message and we can right click deep silver codes and take what world sticks the compiler uses in the last version put because my first impulse and before dr. Leah the other way around use a third son related to printed in this case we have the destination which was the local stock variable but in this case the compiler is going mad but really the vulnerability that what success is actually clear in this code because it is in the private call these examples are very very very easy but cancelled all the two words we can let programs for different operating systems or different architectures for example i will use test one words it
this is the same program but for Windows you have heard of pantheons with the swing bulbs I will export it and compare to the health version this is the program this is the test two it will compare the second version of the program from linux to the first version of the program for windows it is now supporting and here we have departed mattox and we have to consider the program for different of it as resistance or even in the case that we have the same name it cannot be considered best matches but partying masses here we have the full funky on the one that the patches it will right click you have the silver gold and we
can save what was tense in the previews in the first database without the string copied and in the second one the ester m copy one more demo this is the other spike linux or AMD 64 we have a lot of symbols because they didn't strip them will have a fiber member course we also some structures yeah I will have a lot of structure okay we can export everything in the structure the numerator the names the comments and everything from this database I already have everything in the sport in this database because it takes some minutes and now I will go on to open the database for the binary body Joe Simmons this is the strippers
version the very same program we'd have some condoms but not all as we can see here we can do is simply compare this person to the previous one which is this one let me say small small Punkins that's okay say no because we don't want to overwrite this time and it'll start running all the logistics to compare the fun games from both wineries and after a while you can solve all the results this is some time it takes a while this is one of the smallest disease is going to find a position
and hair is it okay from 100 from sorry phone 1678 phantoms in the program we already matches with a similarity great deal of 1.0 1646 dungeons if we go to the particle masses we have only for fun teams that were actually partially masses remember that this is actually the same binary and now we can simply open import everything from the database with symbols to the database with nothing but if we take the structures and Relations c.t.r c.t.r you will see that we have some of them 73 but not all of them are now able to simply import everything right click import all phantoms it ask do you want to import everything yes want it import folder
structures and Relations etc cetera and update the current data rates since the names of the the Colemans Utley structures numerator yawns and what it's going to take a light because it has to update everything okay this is how it works I already found it with some demons and I already explained how it works as of today but I have many many chances on minds as of today it works sometimes better than dynamic indeed but not all it's for example dynamics building works very well with comparing different architectures binaries which are not intellect is fixed amd64 Narn for example if you want to convert one spark finally with a VR risk binary which is very crazy operation seinem is
Vedic works better for now i have sony drop men's Alliance to be better to works better than by this some of the improvements that I have manga model istics like using an intermediate language instead of using roll assembly until optimizations over the intermediate language and then compare the intermediate representation it Souls give very good whistles and it's called be used to compare anything like a lot needs to a strike and so I plan to use basis on these interpreted language symbolic executive the symbolical second to none of basic blocks from different architectures are actually going to give the same result so no matter if the instruction set is in one min son medio de pipi see if the
final result of a basic block is actually the same it is actually the same thing that is being done in the basic block as of today I'm using vex vex comes from bulk net so any instruction set supported by Backman will be support also by their father after I plan to use create an independent GUI tools so it doesn't depend on I this is because some people ask me support for a dolly or asking me support for a snowman or ask is please for support for father well only two guys one there is some people that has kept in for super for it and in the not so near future I want to do to improve
it in the sense of adding a way to match directly a binary from source code what about finding matches between the C C++ also that you have from some application let's say openssl to the binary tub you're analyzing that you know statically linear the openssl code so you don't have to Los your time analyzing the openssl related phantoms into commercial product you'll simply compare the enemy the symbols and that's all I plan to use it comparing the abstract syntax tree from both the source code under the compiler generators one it's not easy at all I plan to do it maybe if I'm lucky and I have enough time in two to three years so it's not tomorrow naturally I
need half a CC c plus plus plus parser to univ 8a stressing tax free from simpler splat source or seed source but this is a non-trivial task that by itself is a full project so it's not going to be available tomorrow and for now this is all there for another source ebl so you can you say it modify adapt it to your name and if you want you can set me back status this is the URL of the Royal