Why is a 2,000-IP Botnet Torrenting Ubuntu?

“Hrmm,” Josh begins, pasting a screenshot of Deluge¹ uploading a Ubuntu 21.04 Desktop ISO at 112 MB/s to some 2,800 peers. “It’s been doing this for a few days.” Pasting another screenshot, a mass of Chinese flags listed next to IPs, he quips: “Anywhere from 5-15 connections from almost every IP in the 110.189.107.29-230 space. I’m missing, like, maybe 2 addresses.”

Loading the same torrent into my Deluge instance confirmed what Josh saw - downloading the Ubuntu ISO worked fine (a little slow, but whatever), but connections quickly rose from Chinese IP space until my instance was pushing over 240 MB/s up. But no matter how much we uploaded (eventually pushing ~5 Gbps from our servers combined), these leechers would always report 0% completion - as if any data sent was being immediately thrown out. For years of mitigating the security and availability impact a sizeable botnet could have professionally, it’s odd to find one in my virtual backyard, slurping bandwidth off some Linux ISOs.²

Now convinced that something was wrong, we started looking around and found a small thread on Hacker News where people described the same symptoms, but the questions kept mounting. Is this just tied to Chinese IP space? Just how big is this botnet? Perhaps most importantly: why would someone do this, are they just trying to waste bandwidth or is there something else going on? While we still can’t answer all of those confidently, we could start to make educated guesses with a bit of research.

Characterizing the Botnet

Fiddling with some Deluge settings allowed me to confirm a few things about the bot quickly, most of which had already been documented by Hacker News commenters:

This bot always uses an empty client string, which Deluge renders as “Unknown.”
The bot only supports unencrypted connections.
By checking Ubuntu’s tracker information (archived here), it seems that this botnet is swarming around four different torrents currently. We started torrenting the other ISOs immediately, because who doesn’t like wasting terabytes of bandwidth?
Each connection the bot makes lives for only about 10 minutes from my observations - sometimes less. This artificially inflates the number of reported leeches on the Ubuntu tracker, which has an announce period of 30 minutes.
Even when limiting my connection to just a couple IPs and feeding them data for a few hours (through the myriad clients that come-and-go), no client ever reported any complete chunks. Given that each piece is 256 KB, and I’ve observed single connections pulling over 2 Mbit from my servers, it’s clear that the 0% completion reported by all clients is not happening by chance.

This bot is therefore surprisingly easy to block based on its characterization alone, and without any specialized plugins or client modification³ - if you are an Ubuntu seeder and need to mitigate this botnet’s activity due to bandwidth concerns, enable “require encryption” on your client and restart it now.

While this mitigates the issue currently, characterizing this botnet raised more questions than it solved - if this is so trivial to block, maybe it isn’t just a bandwidth-wasting attack.

Observing the Botnet

Reading off clients from Deluge manually isn’t what most would consider “good research” or “productive” or “fun,” so we started automating out the ~~boring~~ unscientific stuff.

via Deluge

Initially, we collected data off our Deluge daemons directly with the console’s info command, which outputs text data about the torrent status and peers. We collected IPs for several hours, filtered for all connections that had used the empty client string, and used RDAP to check what Autonomous System each IP block belonged to. The results were surprisingly centralized, for 2069 bot IPs observed, ~99.9% of those IPs resided in:

AS4134 - China Telecom
AS4837 - China Unicom
AS9808 - Guangdong Mobile Communication Co. Ltd.
AS24547 - Hebei Mobile Communication Co. Ltd.

The remaining ~0.1% of IPs belong to:

AS24444 - Shandong Mobile Communications Co. Ltd.
AS56042 - China Mobile Communications Co.
AS17621 - China Unicom Shanghai
AS4812 - China Telecom (Group)

It’s not clear to me why there are so few bots outside the first four networks⁴, and also unclear why all bots are in Chinese IP space. We haven’t yet seen a single connection from this bot that originates outside China. The number of IPs we saw per AS is broken down below:

You can download the bot IPs we observed here, and while this almost certainly doesn’t cover the entire botnet due to Deluge connection limits, it could be useful for further research.⁵ Looking through, you should also notice that many of the IPs are sequential, which makes this seem like it could be a coordinated activity - rather than a botnet which infects vulnerable hosts opportunistically, which I would expect a more normal distribution over large swaths of IPs for. Some examples of network blocks which had many IPs connected to this botnet are:

78% of 110.189.107.0/24 was hosting a bot (200 IPs)
78% of 182.147.91.0/24 was hosting a bot (199 IPs)
44% of 121.28.99.0/25 was hosting a bot (56 IPs)
38% of 218.89.8.0/22 was hosting a bot (392 IPs)
37% of 183.198.86.0/24 was hosting a bot (94 IPs)

Finally, we also observed many ports in use by the bots for leeching, and no specific ports were used commonly across all bots (only ephemeral ports were used).⁶ This data, as well as a few other tidbits, can be downloaded from the appendix.

via DHT

However, monitoring this botnet doesn’t need to take terabytes of bandwidth and result in an unstable torrent client. Most torrent clients use DHT for peer discovery,⁷ and thankfully so does this botnet. The best option I found for this was bittorrent-dht, an easy-to-use Node.js implementation of DHT, and set off to work with little more than the example they give in the readme.

Collecting data over the next day allowed me to discover another interesting trait of this botnet - its connections seem to be cyclical with the time of day. The graph below shows the number of peers⁸ in DHT for separated by ASNs the bot is known to be active in, during October 18th, 2021:

Given that China experiences transnational traffic congestion, it seemed uncoincidental that the traffic from this botnet tapered off before 02:00 UTC (10:00 China Standard Time) until ~16:00 UTC (midnight CST) when it resumed. However, this was not observed consistently - some days saw this botnet hibernating all day and night, such as on October 22nd, 2021:

While the total capacity of the botnet was dissimilar to October 18th, the botnet’s behavior within certain ASNs (namely, AS4134 and AS24547) was consistent. In the mid-term, I will be cataloging this to try to find out if this a a pattern or merely coincidence.

Estimating Capabilities

Another interesting but difficult-to-assess questions is, just how much capacity does this botnet have for evil? I tried estimating the total download bandwidth that this botnet has,⁹ which might help us assess what the botnet could be useful for. Though outside of actually seeing it engaged in an attack - or purchasing enough servers and bandwidth to try to outscale it - it’s not possible to know for sure. But we can make an initial estimate by multiplying some estimates of China’s average internet connection speed by the number of IPs observed in this botnet:

M-Lab: 2.08 Mbps broadband
Cisco VNI (forecast): 59.1 Mbps broadband + 34 Mbps mobile
SpeedTestNet.io: 153.49 Mbps broadband
Ookla: 196.57 Mbps broadband + 165.38 Mbps mobile

Multiplying those estimates by the number of IPs observed, we can see some initial guesses at how much bandwidth this botnet has:¹⁰

This leaves many factors untouched - such as the fact that one device cannot always make use of an entire broadband connection (ex. if a low-power IoT device was infected), or if any of these IPs represent multiple residential connections (ex. if CGNAT is in use). Since the botnet is already drawing bandwidth off our servers, we can do a little better! We can disquality M-Lab’s estimate as “not representative for this botnet” already by just checking how much bandwidth it’s consumed at peak: 5 Gbps (though this is severely limited by Deluge connection limits, upstram bandwidth, peering, etc.).

Even better, we can run a couple tests. To get a better estimate of what the total capacity is, I limited my connections to only a few hosts again and tracked the bandwidth that each connection used. Over a period of an hour I observed 1.5 Mbps/connection even with many connections to the same host (this is fairly consistent with Hacker News’ observations, though my results varied with the time of day somewhat). We can multiply this by the peak number of simultaneous peers observed on DHT that belong to this botnet (i.e. per connection, not per-ip), receiving 1.5 Mbps * ~40000 = ~60 Gbps. With an average of 20 connections per IP (times 1.5 Mbps), 30 Mbps/IP is far below most estimates of China’s average internet connection.

However, that estimate is too conservative to be accurate because it assumes we are saturating the host’s connection alone. In reality, there are currently between 850 and 1,750 seeders per torrent that this botnet is leeching off, and even though this botnet is active, I can saturate a multi-gigabit connection when torrenting Ubuntu myself (i.e. many seeders are well-connected). If we assume that our single server accounts for only 1/10th of the traffic needed to saturate each connection - ex. a mere nine other high-speed hosts (or comparable) are seeding simultaneously - we could estimate that the true speed of the botnet is roughly 600 Gbps. This estimate translates to 300 Mbps/IP (20 connections/IP * 1.5 Mbps * 10), which normally would indicate that this is an unreasonable estimate, but because the density of IPs observed may indicate that this is coordinated activity I can’t disqualify it as a possibility.

There are still uncontrolled bandwidth that resist estimation - such as how the Great Firewall could be limiting my single-stream performance, congestion on any transcontinental routes, etc. - but if the 600 Gbps estimate is anywhere near accurate, this is a reasonably well-connected and capable botnet especially given its size.¹¹

Forming Hypotheses

Despite all of this information, it’s still not clear what the purpose of this botnet is. To recap, this botnet:

Downloads data from seeders on some, but not all, Ubuntu torrents - never uploading data.
Is trivial to identify and trivial to block, even without specialized tools.
Appears to reside 100% in Chinese IP space, and occupies >50% of some network blocks.
Does not reside in ASNs associated with cloud providers - only telecom providers.
Is well-connected, likely with hundreds of gigabits of download capacity.

First and foremost, it doesn’t make much sense for the botnet to be torrenting Ubuntu. Torrents verify the integrity of every chunk as they are received so sending garbage data (or worse, a malicious program) from many hosts should simply be thrown out by the receiving torrent client.¹² However, BitTorrent v1 uses SHA1 for integrity checks, which Google has successfully generated a collision for. It would take substantial effort for the botnet operator to attempt to “infect” the Ubuntu torrent by creating enough SHA1 collisions to produce a valid, malicious output. That all said, the botnet has not been observed to attempt this (as it reports 0% completed chunks, chunks are not even requested), and manually verifying the downloaded ISO against a known-good SHA-256 hash would detect this.

That all said, the botnet isn’t trying to upload bad chunks to peers currently. The opposite, in fact - the configuration and capabilities currently only allow the botnet operator to download data. This wastes legitimate seeders’ bandwidth, raising the cost of seeding this torrent and slowing downloads for others, but not really impeding the torrent’s use. Doubly so since the botnet is easy to block by enabling encryption. When this had only been happening for a day, I assumed that this bandwidth waste could be to demonstrate the botnet’s DDoS capacity (as it is bandwidth- and connection-rich) publicly in a way that was unlikely to cause significant other disruption, but now that the torrenting has carried on for over a week, that seems unlikely.

The best hypothesis at this time for why this botnet is torrenting Ubuntu is: it may be that this botnet isn’t actually intending to torrent Ubuntu. Massive thanks to @rx13, who suggested that perhaps this is some sort of Command-and-Control mechanism, where the bots receive data from the C2 server by downloading intentionally bad chunks from seeders. This could be harder to censor or interfere with than traditional C2 frameworks which rely on few centralized servers - however the bots are quite obvious to spot due to their behavior, and the number of connections as well as bandwidth consumption from this is incredibly wasteful. Perhaps this is a new C2 mechanism and is under active development, which may be supported by sudden drops and spikes in botnet activity.

The final and most difficult question is: who is running this? Given the centralization and density of IPs in China that this botnet has been observed from, it seems obvious to jump to “state sponsored botnet!” Honestly, I would believe it - but I don’t have any evidence to claim this definitively. It may be that there are private actors within China who need this kind of centralized botnet. Better yet, perhaps the idea that this is a C2 mechanism is correct, but it’s intentionally being used only in China by the botnet operator because other C2 mechanisms they rely on are being blocked by the Great Firewall.

Conclusions

While I’m disappointed that I haven’t been able to conclusively identify who is operating this botnet or why, I’m glad to share my findings and have had the chance to learn on my own and tinker with DHT. By publishing this, I hope to encourage people to take their own research skills to bat against this botnet, learn more about how BitTorrent works (and why you should make sure your client supports BitTorrent V2!), and consider hosting their own Ubuntu seeds to help make sure everyone has access to free software.

As of writing, the botnet was still siphoning data off my Ubuntu ISOs, albeit infrequently. I’ve included the torrent files you would need to take a look at this botnet, as well as the SHA-256 sums of the resuling ISOs so you can re-validate your downloaded ISO is legitimate. Though again, as of writing, there is no reason to suspect this botnet operator means to attack the integrity of Ubuntu torrents. If the botnet is shut down because of the increased publicity, we may never find out whodunnit - but forcing them to change tactics or go underground is likely a win.

I’m going to keep an eye on this to see if that happens, and hope to breakdown more interesting information such as:

Analyzing services were running on the bots’ IPs over the past year (as observed by Shodan), to see if there are common themes.
Continually observing DHT to look for new IPs joining or leaving the botnet over time, as well as trying to identify any predictable periods of activity.
Building my own botnet-looking client to see if there are one or few peers that send bad data to me (which could confirm C2 activity).

If you are inspired by this piece to do some research of your own, please consider publishing it and let me know where I can find it! I am available via my contact page. You can also see where I tweeted (remember twitter?) my early observations on this botnet - and would love to read your analysis, hot takes, and crazy hypotheses.

Appendix

Most data presented in this article can be downloaded raw. This data was collected off three hosts in AS200052 (Feral.io Ltd),¹³ though I am now monitoring from different infrastructure.

bot-ips.txt (29 KB): A list of all bot IPs that were observed connecting to our Deluge instances. This is referenced elsewhere in the article as well.
bot-ports.txt (495 KB): A presorted list of the number of times a given port was used by a bot when connecting to our Deluge instances. This may be biased towards or against some ports due to the different connection settings used on different hosts.
bots-all.txt (17.4 MB): Every bot connection that we observed connecting to our Deluge instances, with the format IP:PORT:AS:COUNTRY.

The DHT data collected is not being distributed, as I did not capture the client ID (sorry) and don’t want to distribute non-bot data. Did I miss something when breaking down the data that is downloadable? Reach out via my contact page.

Impacted Ubuntu Torrents

I am including the torrent files and SHA-256 sums of the resulting ISOs so you can observe the botnet and/or re-validate that your downloaded ISO is legitimate.

$ sha256sum *
f8e3086f3cea0fb3fefb29937ab5ed9d19e767079633960ccb50e76153effc98  ubuntu-20.04.3-live-server-amd64.iso
72519a9586656359862091c9ac46511a0a03c4756fee145776f2175f73c9aebf  ubuntu-20.04.3-live-server-amd64.iso.torrent
3ef833828009fb69d5c584f3701d6946f89fa304757b7947e792f9491caa270e  ubuntu-20.10-desktop-amd64.iso
fabe94efe3d12d5508f1a5eeb9172c70280cc9c43a24ef2065b8e7fb7974a08f  ubuntu-20.10-desktop-amd64.iso.torrent
fa95fb748b34d470a7cfa5e3c1c8fa1163e2dc340cd5a60f7ece9dc963ecdf88  ubuntu-21.04-desktop-amd64.iso
4377c0ec9e28d582e0f3014d47916c5a6b949fd9da18ef87a729628c6546719f  ubuntu-21.04-desktop-amd64.iso.torrent
e4089c47104375b59951bad6c7b3ee5d9f6d80bfac4597e43a716bb8f5c1f3b0  ubuntu-21.04-live-server-amd64.iso
0e2e73cdd525fc45c99f7fc72b357fc477b05cf00d7d779e70ce071ce91cca53  ubuntu-21.04-live-server-amd64.iso.torrent

Author’s Notes

Deluge is a lightweight, FOSS BitTorrent client. ↩
Some of us are undoubtedly used to the internet euphemism for Linux ISOs - meaning “pirated media” (ref) - but these are just Linux ISOs. At this time, I haven’t seen any reports of this botnet showing up on any other torrents, especially not illicit content. ↩
That’s not the only option of course, if this is going to become a major or long-term issue, it would also be trivial to write static or behavioral rules to block - such as blocking unknown client strings, or rate-limiting clients which have received more than a chunk’s worth of data from your client but fail to report it as complete. ↩
For comparison, AS4134 (hosting ~47% of IPs observed) currently announces ~112 million IPs; while AS4812 (hosting <0.1% of IPs observed) announces ~8.6 million IPs. Assuming both host equally-probable-to-infect systems, we should expect to see AS4812 hosting closer to ~3.6% of the IPs observed. However given that so many of the IPs that were observed from the botnet were sequential, it seems that this may be coordinated somehow. ↩
This article is for research purposes only - this list will not be kept up to date. I am looking into tracking this longer-term but a separate distribution method would be created for it. While you can do whatever you want with this list, I definitely wouldn’t recommend polling it for updates, and I wouldn’t recommend adding it to block lists unless you have a specific reason to. ↩
I’m not exactly sure what to make of the different ports in use, though since there was no single or few ports used, I don’t think it’s especially valuable information. ↩
Mainline DHT (referenced as “DHT” throughout this article, yes I’m lazy, I know) is a distributed hash table based on Kademlia. If you’re interested in learning how Mainline DHT works, I strongly recommend the spec alongside Angel Leon’s notes. ↩
This graph shows peers, which are unique clients (IP and port combination) - not IPs. ↩
Please note that this is is download capacity, not upload capacity. A botnet’s ability to DDoS targets is generally measured in its upload capabilities (ex. in Mbps, PPS, or RPS) multiplied by any amplification factors it can utilize. ↩
Where broadband and mobile estimates were available, the mobile estimate was used for all IPs allocated to ASNs with “mobile” in the name - otherwise I just used the broadband estimate. It’s hamfisted but it’s not going to be as precide as my later estimate via connections anyway. ↩
As a reminder, none of these estimates should be considered absolute, and I welcome any/all concerns about them. I’ve explained the factors that I think would be significant, but could very well have missed some, over or underestimated their impact, etc. ↩
To reiterate, torrent chunk integrity is not decided by majority, quorum, etc. Torrent files include a hash of each piece - torrent v1 using SHA1 (which is vulnerable to collisions) and torrent v2 using SHA-256 (not currently vulnerable to collisions). ↩
Big thanks to Feral Hosting for putting up with dumb research and literal Linux ISOs using … checks traffic stats … 15TB of bandwidth in one week. Sorry. ↩