
Good morning, everyone. How are you? I think we'll put the slides in a little while. My name is Italo, I'm an intern at the Great Cybersecurity at Kaspersky, doing research and normally working with malware. And today I came here to give a brief introduction to the cryptographic techniques that Brazilian malware uses in the context of obfuscation. So, thank you. So, what am I going to address here in the lecture today? It's stuck, I think. What is the agenda of this talk today? I'll start by giving a brief introduction to what are the Brazilian malware families, what they do, what population they attack here in Brazil, talk a little about what is encryption and cryptography, talk about the SHORE, which is an operation that I think you
know, mathematical, which is widely used in a modified way by the Brazilian malware, then talk about other numbers that are used, dividing into classic numbers and modern numbers, and talk a little about tools and what you can do to, despite these obfuscations, be able to make a well-made analysis of these families. So, what is the operandi mode of these Mauer families? Basically, the Brazilian Mauer It usually involves a bank fraud. So, he wants access to his bank account, financial fraud. Here, he usually disseminates himself through phishing. So, here's an example of an email from phishing that attacked the General Electricity Company of Argentina, or Chile, if I'm not mistaken, of Mauro Melcos, which is one of the families that I'm going
to address here today. Among some cool features of the Brazilian malware, shared code is one that hinders a lot the analysis of these tools. Why? Because they share a lot of code. Basically, there is a remote access tool written in Delphi called XSPC and some other variants that several families use the same tool within their code, written in Delphi, normally, and it is very difficult to differentiate which family is which. So we have to use other techniques for this. In addition, they also share other similarities, such as always using side loading DLL and the same cryptographic algorithms that will be the focus of the presentation today. They are usually written in Delphi, .NET, C++ in
the case of some newer families, and what they do to access victims' content is use software for remote access and an image overlay. So you can generate some fake bank screens to get sensitive information from the victims. Now talking a little bit about the obfuscation and cryptography that they use. What is that? Obfuscation is basically what these criminals will use to make their code more difficult to analyze. Their malware in general. The act of hiding is the act of hiding or making it difficult for sensitive information or the operation of the tool itself. So, we can separate into two types for this presentation, which is code and data. Here we will focus on data, which
basically consists of hiding the data in some way, either through a figure, which will be the majority of today's examples, or other methods like StackString, if anyone is familiar with it. Code obfuscation is a little different, it modifies the structure of the compiled code itself. For example, ifs can look like todo switches, for, whiles, otherwise it will be able to identify as a loop immediately, and it is usually a little harder to break. Obfuscating is hiding, it's not the same as encrypting. But to do the investigation they use cryptographic algorithms. And this will be a problem, we will have how to break these algorithms. The reality is that we don't need to break the algorithms. What we need is to be able to identify the algorithms that are being
used, identify the keys that are being used, what is being encrypted with each algorithm. And this can be done, both in a dynamic analysis and a static analysis using tools like Debuggers, Decompilers, Disassemblers. And, since you identify what is being encrypted, you can decrypt it in many ways, with many tools, own scripts, and we'll take a look at that. So, what do they usually hide using data obfuscation and cryptography? An example I put here are malware settings. This example comes from the Java.ly malware, a type of configuration that they usually put in pastebins or other open text repositories on the internet. And basically it encrypts, with a specific number that we will see in detail later, the remote access address for connection with C2 server. So when it
communicates with the attacker, it basically has an address that, if it was fixed in the malware, would be very short-lived. So, it puts it in something hidden, to make it harder for analysts or automatic tools to identify, or sometimes in a remote part, which can be modified later by the attacker. So, if this URL falls, it can create another one, for example. It doesn't always happen, but it can. So, the figure itself. The SHORE that gave the name to the presentation. Basically, the SHORE is a binary operation, that happens bit by bit. You probably have seen it in college or in some similar context. 0 with 0 gives 0, 0 with 1 gives 1, and 1 with
1 gives 0. Here we will normally be talking about Shores byte by byte. So 0x32, Shores 0x21, you will have the result 0x13, for example. Using Shores you can easily search strings, files, and what you need to hide from those who are analyzing. But they don't use just that, they make some modifications. The Shor is not used only by the Brazilian malware, it is an operation that is present in real, safe, and used in legitimate contexts. So, Spring Ciphers are basically a big family of numbers that are based on the use of a Shor with a key generated in a pseudo-alexated way. Examples of some Spring Ciphers are this one, as well as ChaCha, Crypto1, and the Untimepad is a... theoretical number that
can't be used in practice, that is based only on the XOR with a key generated randomly of gigantic length and that can't be reused. It is a very used operation for obfuscation because it is a fast operation, it has hardware, it's just an XOR instruction and it is very simple to implement, at the same time that it can hide perfectly arbitrarily - an authentication in the Shor for a Redshin or Pentest exercise, for example. You can have an A string you want to hide inside your binary, for example, bank xyz, something common in malware because they monitor windows by list of names, and you will use a key of a small size, for example, "chave". You repeat this key if
it does not have a large enough size and consider the values in the ASCII table of these two strings. You make a Shor For example, B with C. Oops, sorry. Here, the pointer. Here. Where is the red pointer? On the other side. Right. Bank X with Z, key, key. Here we can divide B and C. When we convert to table A, X is decimal, it is 42, 43. 42 or 43, you can put a calculator or do it manually, it will give 1. So you store this byte 1 in a specific memory position, which will be recovered later during the execution if you need the unencrypted string. The key is repeated and this is the pattern, you do this byte
by byte. This is very simple and to make our life a little more difficult, the biggest Brazilians, several families, use a slightly modified version. So, up here you have the String that is encrypted and it is stored in a format of xify. So you see the String in the memory DB1A09. These are not bytes, this is the String that you can see there, in some cases even with simple Linux strings. it will be converted to bytes in hexadecimal and will pass through a short. The first byte of the string, DB in this case, is ignored at the beginning. And you start by doing the key, which is this string below, this is a real key, an A short 5. Then we have not the value 5, but the 5
in the ASCII table. When you make this short, you get a new value that you will subtract with the previous byte of the string, which is the DB. This subtraction is not an exact subtraction, but in this case it would work as a subtraction, and would give an S. This is the first part of our string, the first char. You continue this, always using the previous character of your offscan string. And so you get, for example, a string select asterisk. Why did I say that the subtraction is modified? Because if the subtraction value were negative, you don't do a subtraction and consider module 256 in byte. In reality, you add 255. So it's wrong by one, the byte. It wouldn't be exactly a subtraction module 256.
Families that use this specific number. I put here as an example the Melkos, the Javalí and the Grandoreiro, which are families that use DELF, very well established in recent years, which are a little older, but that doesn't mean they're not very active. They're very active, actually, some of the most active. Melcos is interesting because this same number he uses not only in the final stage of him to hide the strings, but in the initial stage of infection, he can also use this same number with the same key, 5VANV4SDM and several other characters, it's a little big key implemented in a completely different language. So the same number is in Delphi and is in PowerShell. Thank
you. The javali is the example we used before, that configuration of the C2 server, is encrypted with this same number, and the grandorero uses this number, and nowadays others too, to encrypt strings, configuration of your domain generator algorithm, which I can talk a little bit more about what it means, and other important information. The same number for many different families, very similar. Here I need this to work, ok? This here... This is the code for this number implemented in PowerShell. Just zooming in a little bit so you can see a detail. Right here we have technical difficulties here. Two touches. You can go back, actually. I'll stay here.
Here is the text, right here we can see, at the beginning of this function that is written in PowerShell, you can see the key, it is a little longer than I showed before, it is 5VAN, this is the PowerShell that Melcos uses. Right here, it basically creates an empty string to be decrypted, and it will pass, at some point here, through the show. It takes the characters two by two, converts them to bytes, considering that this is an hexadecimal number, Passes through a loop until the end of the string and always considers. Right here we have the cases below. The cases were what I wanted to show, but it's a little bit... There are some ifs and else, why do we have these ifs and else that we wouldn't
have in a consideration if it were a normal shor? Because of that case of 255. Basically, it checks which is the biggest, the result of the shor, or the previous byte of the string, and depending on which one, it makes only one subtraction or one subtraction, adding 255. This is the point of this script for shell, which has some variables already modified to be a little more legible. for those who want to see it later. This is how it looks in the assembly. This is not taken from a sample from Melkos, it was taken from a sample from Mauro Regrandoreiro. This is the graph flow to understand a little how it works. The block above
is the block where the Chor will actually be made. I think those who are further ahead will be able to read a Chor there, but you can also see it here. Below, we have basically the division of the paths, in which, on the left, there will be only one subtraction, on the right, there will be a subtraction, adding the 255. At the end, we have the storage of the result of the decrypted string and the return to the beginning of the loop. You can see a thin arrow passing here on the side. Here, a little more detail. Okay. On top, the "shor" on the side adds 0ff in hexadecimal, which we'll see later. On the side, it doesn't add. This is the important part. Here we have the
rest of the division is obtained, because the key, remember, is not large enough. In the loop, when you come back, there is a division rest to be able to get the correct position on the key. It turns the key to use the correct item. OK. steal so much because the detection will be easier, it will always be analyzed extremely fast for each new campaign and that's why they use other techniques. They use, for example, several classic numbers. The simple substitution number is the most known when we talk about the classic number context, and two very well-known families, the Chavecloque, which is a more recent family that uses C++, use not only that "chore" that we saw earlier,
but a customized number of simple substitutions, with its own alphabet, and the Grandoreiro does the same thing, using simple substitution numbers together with the "chore". So there are layers that make it a little more workable for the analyst. The number works by changing the letters of the normal alphabet, for example, A, to a letter of a new alphabet. In this case, I gave an example of a Y, but they usually use very random characters that make us forget the words, like "@". The poly-alphabet substitution number is a little more complicated and not so used by other Mauer families. The only example I put here that I know is Amavaldo, who uses the Vigenère number. The Vigenère number basically works as a simple substitution number, but it has a key
beyond the alphabet itself. So, in a set, you take a key character with a string character and look at this big table to see which character will be replaced. In the case of the Vigenère number, this table will be several permutations with the alphabet's displacement, basically. And the Mavado uses this number and ends up carrying other Mauer families as well. It works a bit like a loader. Modern figures. These are figures that are not safe cryptographically speaking, but the Maier families don't care so much. The classic figures I showed. Modern figures are safe and they also use it to give a variation, basically. What are they here? Block ciphers are a big family of modern ciphers, just like string ciphers. String ciphers work based on the shor
and have the idea of generating a longer key, as I mentioned before, with the function of number of cells. Block ciphers divide the text that is being encrypted or decrypted into blocks of fixed size, for example, 16 bytes. These blocks will have several complex mathematical operations that will vary depending on the number involving group theories, for example, and the AS is the most common example of this. They will have several operating modes that give a little more variety in the use of them by Brazilian MAUs. For example, up here we have the ECB mode, which is the most standard. You divide the blocks and put them together after passing each one separately through the blocks.
In the CBC mode, which adds a little bit of entropy in some cases, for example, images, the ECB is not good, you have, in addition to the key, an initialization value that passes as a short to the first block and the result of the encryption of the first block is used as an initialization value for the second block and so on. These two modes are very common to be used by the Brazilian malware together with AES. Janela Hatch, for example, is a newer malware family that uses .NET, it's present here in Brazil, in Mexico, and it uses the ASCBC with a very characteristic key, which is the #MD5 of the #8521 string. So it calculates this # when encrypting the strings it
needs. Melcos uses the AS as well, besides the SHA, but he uses the ECB mode. And usually he uses the AS only for his configuration. So he has a very characteristic configuration, separated by circumflex accents. This configuration will contain things like a connection address, like in the Javali, but it also contains, for example, Bitcoin wallets. He uses the AS and ECB with a hash, a part of the SHA1 hash, of the same key he uses to encrypt using the XOR, so that same 5VAN. It uses the same key. And it's when we see these repetitions of keys happen that we start to realize that we can use this to identify a little what the families are, if you're in doubt there. If the key that appears is 5VAN something,
it's probably dealing with a Melcos, and that's very interesting, it helps to deal with the problem of shared code to a certain extent. The Grand Oreiro is one of the most well-known families in Brazil, because there was an operation by the Federal Police, which Caspés, who even helped, sharing information, to carry out a group prison. He uses the Xó, and the simple substitution that I mentioned, but he is not happy with it, and, in addition, he uses a slightly more complex mode, different from this one, of the AES, which is the CTS mode, it comes from Ciphertext Stealing. This model is used in more recent variants that came out only after the Federal Police operation, together with Interpol and the Spanish police. It's very interesting what it does, using
many different numbers in different variants. Some older variants are still active, you can see it using only the Chor. It's a family that has a certain variety. The Coyote is similar to the Janela Rache in the sense that it is a more recent family that uses .NET. and it uses OAS, but it uses OAS CBC with random keys for each sample, it doesn't have a key as characteristic as Janela Hatch, but its encryption occurs in a very characteristic way too, because instead of collecting only the string and calling the function that decrypts, it has a large table that is inside a function, when you call a function to decrypt and get a string, you run through this function, seeing if it's the string you want,
and only then it calls the function to decrypt, in fact. It's slower, more inefficient, and easier to identify why it does that, it's not very clear, but it's easier to identify that it's a coyote when you see it, in addition to other characteristics of it. Asymmetric cryptography is not used much, but some variants of the Hatch window I saw using not only the AS, which was mentioned earlier, but also the RSA. It has a slightly simpler math in this case, basically you use the modular exponentiation with public and private keys being hidden through the problem of difficult to factor very large numbers. So, there's an N, which is the product of two cousins of, let's say, 512 bits. It's bad to factor that. The private key is P, K
and D. And in this case of the Hatch window, it's interesting that Some strings were encrypted using only AS, while other strings were encrypted using AS-RSA, and other strings were encrypted using only RSA. The RSA didn't have any specific key that it always shared. In reality, inside the same sample, there were several RSA keys that were used for the different string groups. So when you're writing something to do a deselect, you need to identify what is being used in each deselected string to be able to decrypt it correctly. Speaking of tools, I talked about how to identify, I told him what the numbers are, but how do we identify which number, how do we... having the knowledge of the number, can make the binary
more legible for us, or just extract the information, if that's the case. There are several tools, both ready-made and you can write your own. So, talking about Python. Python, I left it as an image because, normally, if you are analyzing something new, and you are usually looking for new things to analyze, there will be no ready-made thing, you will have to write something on your own. Python is useful in this sense because it has some APIs, it has some libraries that you can use that already facilitate this process of interacting with binary. Aida itself, for example, which is the standard disassembly decompiler used in malware analysis industry, for example, has an option for you to write scripts that interact with to his libraries, the API, and you can
not only extract the strings, but also put comments on what is the correct value of the string, decrypt them and replace them in the binary to have a easier view in the decryption and in the disassembly. The df for dot is a very useful tool if you are analyzing malware written in .NET or any kind of hidden .NET, because in addition to serving for data disk authentication, like here, where you can or be lucky enough to automatically identify what is being used and it decrypts everything automatically, substitutes the strings, leaves them very beautiful in the binary for you to analyze, with .NET being a relatively easy language to decompile, or you may have to use it manually, find out what the function is being called and
ask it to replace all calls by this function, by the result. So it goes there, finds inside the binary, you pass it to it, it has both an ang and an cli, and it finds all the calls for function, substitutes, and leaves the binary almost perfect for you to analyze. When it's working with code desufcation, not data, it has some ways to extend and write its own desufcation, or it can automatically identify which desufcation is being used. In some cases it happens, in others not. Remembering that it's specific to .NET, unfortunately. The HRTNG, this is interesting, it's a plugin for AIDA, so it's based on that Python that you can write for AIDA, but it's a ready-made plugin. It was
developed by Kaspersky, by members of GREAT, and it helps not only in this functionality of the Eclipse string, you can point functions and it can automatically identify some things, but also in other types of desufficiency and things you need to do while you're analyzing an extremely sophisticated malware. These are the three tools I brought, but there are many others and unfortunately you can't have a complete list here. This is the presentation, thank you very much. If anyone has questions, you can ask. Brother, great presentation. I just wanted to know from you what you do to identify a family. Let me try to be a little clearer. We have variants and we have families. We have different names. Banker's bag, if you swing a tree, you'll
fall a little. We have big goldsmith, a machete, a sword, and so on. But at the same time, we have variants. So I don't get any bad analysis, I don't get much, actually. And for me, it's hard sometimes to distinguish what is a variant or if it's a different family. And then, within this scenario you showed, there's a lot of shared code and everything. So the question of attribution is something that is very difficult, at least for me. And I wanted to know if there's something that you look at in the behavior of malware, or in the malware code itself, that helps you distinguish between one family and another, or know if it's a variant
or another family. - is very characteristic and if this DGA is present, you can usually say with high confidence that you are dealing with a big shot. In the case of Melcos and Javali, the format of their configurations is already well known, documented publicly, privately, so it's something else you can identify. Cryptography is something that ends up being very shared, but when you have a well-made list of which families use what, it's relatively you can say with confidence that it is this family. So, if I'm analyzing a sample and I see "AS" being used with the key being generated from a SHA-1 hash of that 5VA something, I will say with certain confidence that it is Melcos.
But it's always possible that some other malware takes that same key and reuses it, because there is communication between these developers, it becomes clear when some techniques are reused. The Cloak key started using substitution figures. Shortly after the operation of the Federal Police, the Granadero started using substitution figures. So, if you limit yourself to just one thing, you will probably at some point give a But it has to be a set of things to give a better attribution. Any more questions? Thank you for the lecture, Ítalo. Guys, now we're going to take a break for lunch. Remembering that in the kit you received there is a Vale Pastel, just remove the two tents on the side
of the hotel, okay? And then I'll just give a thank you to the sponsors here: Apura, Elytron Gemina, Google, Hackers Rangers, Thank you, guys.