← All talks

When Home Isn't Safe: Detecting Malicious Networks Hidden Behind Residential Proxies

BSides Seattle · 202621:3346 viewsPublished 2026-03Watch on YouTube ↗
Speakers
Tags
StyleTalk
About this talk
Bsides Seattle February 27-27, 2026 lecture Presenter(s): Duong Dinh
Show transcript [en]

So you know a little bit about myself. I'm more of a software engineer. So when you're a software engineer you like to build products and a lot of time you come to realization that uh just have a little closer. >> Hello. Hello be better. >> Oh okay. Yeah. So so when you program a project you you come to realization that data it's uh it's everything. So you build like say for instance your first SAS you want to write you know some script that maybe go through the internet to grab some data ethically. So as you're doing that you realize that there's a lot of factor that stopping you from getting the data and that when

I start getting more into networking to uh you know to to look more into it and to figure out what what it is. So okay so here's one of the example that I found. So I run like a small software the service company. So so when you have you know a software that allow people to buy and or like you know subscribe you realize one thing that uh there's a lot of uh bad people. So um there's one of the example so this guy he do something called I think credit card shuffling. So I don't know where he get all this credit card from. So I have like a lot of people name and credit card and you

know all the data and then he trying to subscribe to a lot of my service to make sure you know which credit card works. So one thing that he forget to hide is his email. So he use the same email for pretty much every single credit card. So my system kind of flagged it and then I look into it and then we are we go through all of these IPs and then we see a lot of residential IP and be like okay that would be some fun topic to talk about in in Bside Seattle but residential IP and u so you know I work with tribe a little bit to to figure out who that is and I think got him handled

um and that's pretty much it. So yeah let's let's talk briefly about residential IP. I believe that the his they appear around 2014 and it grows significantly in recent years and the thing about a lot of this residential IP is that there are the lack of transparency. So the companies do not disclose the exit node and you know how they even get to the IP in the first place. Some claim to have millions of IPs and you know just some simple Google search going to show you millions of result like all of this company where you can subscribe and use that for whatever purpose that you want to do it for. So now the quick question how do they do

it? How do they grab your grandma's IP and use it at their own IP? So there's a lot of um way that they can do it. One of them is um say for instance you you you buy a free VPN or they have like some kind of bandwidth sharing thing where hey if you do this we're going to give you some money and usually when a lot of things that are free they aren't really free you are the product. So yeah, so that's uh that's essentially how I kind of found out how they do it. But essentially, you know, they they don't really disclose it. So it really hard. You know, a lot of time they also

could be hacking and um once they have like a back door or certain kind, they would be able to use your your IP address. So yeah, like I said before, why do you use residential IP? So if if anybody in this room ever trying to scrape something on the internet, you you realize that a lot of this website they have a pretty good detection in term of like you know if you use like a well-known VPN to shuffle your your IP address or the data center IP then a lot of website have like a pretty good detection that would block you pretty much immediately. So one way that uh you can scrape which is you know not really

recommended to do if the website do not allow scraping is through residential IP. So the the server would essentially it's hard to tell if that's like you know you're trying to script a website or you not trying to script the website. So yeah and you know you all can do a lot much worse with the residential IP than just that but scraping is something that I know pretty well that's what I give an example of. So yeah let's talk a little about detection. So as I was going through this and um kind of play around with my server and you know just run a few tests and I I I can somewhat say that there are a

few way that you can detect it. Although a lot of them aren't it's it's really hard you know like this these people are really good at like hiding their stuff. So it it just like the general way that you would be able to do it and every of this method have a lot of drawbacks and uh yeah and let's talk more about the TTL. Wait, where's the slide? I missed a slide. Hold on. Okay, here it is. Someone actually deleted it. So, time to lift analysis. So, um the time to encode how many hop packet has traveled by decremented by one and different oss have different uh starting TTL which can generally see by this equation right there. observe TTL

is in initial you know subtract by hop count and uh sometime residential traffic and and proxy have like a different network path with you know like different hop counts. So for instance, if you receive a package with the TR52, most likely it going to be uh the initial 64 and it gonna be like some other like 12 hop and then you can maybe build a system with have like a baseline if the same city going to be like around 12 hop or something like that and you can use that to somewhat detect the anomalies. Um and you can also make sure like the same IP have some stable TTL you can I guess like build somewhat of a stateful

table for all of that. And um the way that you should be able to do it is to use ebpf which um essentially think of ebpf as like one of those program where it runs safely in the kernel um without you know modifying kernel modules and um but the one drawback about this is um you know if you a lot of sometimes they use BGP and ECMP which the path going to be different and going to be weird and also mobile network also would have a really weird path. So it's it's really hard to like tell from TTL alone. And um so another somewhat of a indicator is reverse DNS or RDNS fingerprint. But here's the thing, the reverse proxy

is designed to p bypass this. But you can always use this to make sure that you know is it a proxy or not. What I come to realization is that a lot of people who use residential proxy proxy also mix with their VPN or any other proxy. So they would have use VPN and sometimes the VPN doesn't work. They switch to a residential proxy and just like just run around it like that. So it's not like you know they don't always use a residential proxy. So RDNS could be somewhat of a weaker way but like you know you can always build into your detection system um just add a few code and you know every time it it violate a

certain law you just like add maybe five as your um baseline. And another one that is a little bit better which is J3 and TLS reuse. So when the client requests um oh I can do this the whole time. So it's essentially it's like a TSL handshake and the HTTP request. So so the client send a TLS hello and they will get a MD5 hash and it's essentially answer to the question of what kind of TLS client is this. if you use you know bots or proxy often you curl Python or headl browser it's a lot easier to um to see and so the J4 is just another J3 improvement but this slide we're going

to go more into JA4 plus um you can also see you can also um do TLS reuse because a lot of mod they usually like they don't really reuse a session they create a new session I'm somebody who created bots it's also best to use a new session then to reuse it based on my uh my experience of uh pentesting and and and playing around with this and now let's look at J4 plus so J3 is TLS only is no longer sufficient you know more than detection require multi-layer approach um which is why J4 plus somewhat warrant I guess um so here's one of the research that I found online um so they use say a A4T

and TCP feature which is your package side TCP windows it it go across the layer you know from layer three to all the layer 7. So it make it a lot harder for the proxy or like you know anybody provide the proxy to spoof because you essentially need to maybe modify the kernel to a certain degree and as you can see and see the signature although you know this is only 60% effective. So it's not like a concrete way but it's also like a good way to detect whether it's a residential proxy or not. So let's look a little bit deeper into this. So J4T fingerprint uh is essentially the sin because the sin packet um it's like the only moment hold

on where the client um reveal the true TCP configuration. Um so you can you can see here you know you have a IP header 20 bytes USB header 20 60 byt which include option and the option has MSS window scale second and you know all that and we look more into the MSS uh here. So let's break out the J14 fingerprint. Um so here's an example that you know I bring earlier that one of the way that you can see regular box proxy is by looking at this. So, M MSS is essentially derived from path MTU and uh it reflect the encapsulation overhead which can be used by this um equation right here and uh so since proxy and VPN

introduce encapsulation this reduce effective MTU. So what they found is that the normal MSS is you know like around 1460 the proxy MSS is around 1380 to400 and also the option ordering is not standardized and it's pretty difficult to manually pay without kernel level change. So this can also be used um you know as a way to to detect proxies. And now let's look at more about how you would be able to collect it. Like I remember like I said before um EBPF is a pretty new topic. Um it's I'm pretty sure it come somewhat recently and there's a lot of research that going into this which currently I'm kind of doing. So I don't really understand eBPF

to the fullest. I would you know I'm just kind of do more research to ebpf and um so since J4 and TL rely on TCP IP packet field this field obviously will not be able to be visible in the application layer like node or uh engine x which is why you need you know access to the packet before the HTTP processing which is why you know you bring ebpf into the the question or like the the um equation so it would allow real-time packet inspection And it's essentially pretty fast um because there's no you know kernel module change or you know you don't really mess up with the kernel. So ABPF is a upgraded version of BPF or like the

original uh BPF. I'm pretty sure the original BPF is used just to inspect the traffic and uh everything in between. And then the EPF now you can hook into the kernel. you can actually write a lot of um detection or like uh system with APF because you can see like all the all the calls from the program but yeah like you know like I said you know the limitation is obviously there's a lot of limitation it's hard to really I suppose um catch if somebody who is definitively use um you know proxy to attack your site or not you kind of need to use your intuition like for instance when I run my service I see a lot of this request

and then you look at the logs you kind of have to use your intuition to like detect whether it's you know something to worth look into block or anything like that and AI is another tool that you can use but uh from my personal experience human intuition is always a little bit better than AI since AI use like you know like somewhat of a distribution system and uh it's probability and you know it's is it human intuition is always going to be better for in my opinion. So a few Kais notable industry respond as you can see all of this you know have you ever trying to buy a ticket from like a like a concert and

then all of a sudden you just can't get it because all the bot is getting it. So yeah the industry trying to um work to to help uh reduce the bot. I'm trying to say. So here's the thing. I was going to show you guys a demo but uh in doing the demo I actually link one of my really crucial um IP address that use in my um my infrastructure. So I can only post like a I guess like a red inducted version of the terminal. So essentially what I'm showing here is when you install a VPN it's really easy for the VPN to you use you as an exit node. So you see this is my VPN. I curl

something and you can see it's my VPN IP and then I can you know use my other computer system as a use a VPN to use that computer Wi-Fi I guess. So yeah essentially that this what it is also my computer after working with this I kind of screw up the routing table. So this cannot connect to the internet except my for internet. So I can't also show you because then I would need to configure my VPN to add all the route manually which is like a pain in the ass but but yeah but essentially yeah but that's pretty much it. Um but yeah here you can see I have to manually add a bunch of

like I don't know like routes whenever I want to go somewhere. So, and I don't know how I can reset that, but uh but yeah. Um it's pretty cool like when you run like your infrastructure, I can show you. I can't really show you, but like uh this is essentially my infrastructure that I have. So, what's pretty cool is that I run like a I own like a SAS 24 block um public IP. So I just run it in the data center Chicago and that's more so of my research project where I would have buy a bunch of routers switch colo in a Chicago data center and then just run a bunch of you know like see what's going on and one of

the thing when I was doing the VPN I link my one of my very crucial IP address for my VPN which you can literally go into all of my infrastructure so have to reenact it but yeah that's pretty much it if you guys have any question um yeah um any Any question? >> Thank you.

>> At least in my mind, a lot of these detection capabilities are a little bit more expensive, right? They require, you know, another looking at additional layers in the OSI model essentially identifying, right? So presumably if you identify an IP address or some indicator, you're going to store that, right? And then perhaps you have another legitimate user that happens to have a malicious device on their network. So they're going to be clustered >> with that traffic, >> right, >> just based on the IP address. What do you do in that situation? Right? How do you handle the risk tradeoff of blocking a potentially legitimate user that happens to have essentially malware, >> right? >> Okay. So,

>> I think the risk comes from where you have essentially your devices for example are they use your device at XO to be able to use your service from, you know, wherever they are. And so, how do we like want to differentiate and discern whether or not the price is actually coming from user like you said, >> right? >> It's coming through like an device within like our >> right. So, yeah, this is actually like a good question. Um, it's it's really hard to to do because, you know, you don't want to block the IP of the the person who legitimately use a website and there's somebody who literally trying to attack it from that IP. So, here's the

deal. Either you can terminate that session, which is something that I think that you should do instead of blocking the IP, or if in the greater, you know, grand scheme of things, maybe blocking one IP might be might be the way that you you might just have to do that. But generally I believe you can should just try to terminate that session if that makes sense. You know what I mean? If that session is like attacking then you know maybe I don't know just terminate that I suppose. >> Got it. So your recommendation is to apply the mitigation specifically to the session that you provide the signals from not to cache but IP address any

other further association >> right? So you know that's that's the whole idea of you know J4 plus right you focus on individual you know connectivity instead of the whole you know what I mean >> right I think I think the challenge there right just like figuring how to realize like cost of doing that right >> right yeah >> strong instead of if you've done a hug sort of computational >> it make it harder for them yeah >> to reuse >> like you know like for instance I when I run my uh I guess like scraping thing a lot of Uh, a lot of the website they have capture and my Google Gemini solve the capture pretty much pretty easily.

So, you know what I mean? It it obviously going to cost me a little bit more to use Germany to solve the capture, but they can't really stop me from doing it. It it's really hard. Now, for instance, Duck Go, they they apply something that I thought is pretty smart. So, when you scrape duck go, sometime it return the 202, which is like it works, but you have to wait for it to process. So like you know when you write a script you're like oh it's 202 then you skip it you know I mean you never wait for the result. So yeah it it just make it harder you have to re request or like rewe until like it

become 200 and then you see the all the data but yeah essentially what we trying to do is to make it more expensive for you to to scrape or like to to attack or whatever. But yeah >> do you have any visibility in the um they're sourcing the IPs you've mentioned like you see the free VPN apps or like browser extensions. Uh, do you know like how awkward that is for sourcing IPs compared to like hacking IoT devices and things like that? >> I'm pretty sure the bandwidth sharing and the the free VPN pen are way more popular than than hacked IoT devices because you if if you do a simple Google search a lot of this program they offer

incentive for people to do it. A lot of people you know free stuff is is good, right? So or like sometime they give them like free crypto if they allow bandwidth sharing which is you know the way one of the way they can do it. Yeah. You have anything? >> Okay. Yeah. Um but yeah that's that's pretty much it. Thank you for for being here even though I'm a little bit Hey.