These are the most popular domains mentioned in the Collections 1-5, ANTIPUBLIC, MYR, and Zabugor breach compilations, sorted by most to least references. I computed these to effectively hunt for mailservers whose domains had expired (so I could register them, protecting innocent former customers) - but you might also find this useful if you’re interested in:
- Rough data about what email servers are most popular (see Biases, below),
- Studying what typos people are likely to make, and security implications of such, or
- Maybe something else!
Contents
File Tree
Most Popular Email Domains in Collections 1-5 & ANTIPUBLIC & MYR & Zabugor/
email_domains_by_popularity.zip
README.md
Example
email_domains_by_popularity.zip
contains email_domains_by_popularity.txt
, a text file containing lines in the format:
count1 domain1
count2 domain2
count3 domain3
Where count
is the number of times that domain
appeared across the Collections 1-5, ANTIPUBLIC, MYR, and Zabugor breaches. All lines are ‘clean’ and contain no other data, the domains are confirmed to be valid, etc.
For example, the first ten lines (“top ten email domains”) are:
3597346948 yahoo.com
2397990516 hotmail.com
2166598264 gmail.com
1952784572 mail.ru
639286698 aol.com
587539625 yandex.ru
471155911 rambler.ru
297777628 bk.ru
234879450 live.com
233658522 list.ru
Notes
Methodology
To generate this list all of these breach compilations were read, valid domains were extracted, and then counted. 1 domain = 1 mention of that domain. I did no other filtering, no aggregation by username associated with each domain, etc.
Biases
This dataset is heavily biased because of the various sources used.
- Some domains are overrepresented compared to their real-world popularity:
- ex. Yahoo’s entire customer base from when they were breached
- ex. Some lines and files were repeated by the breach compilers
- ex. A single highly-online user being counted many times across many breaches
- Some domains are underrepresented.
- Newer domains are likely underrepresented because of the age of these breaches/compilations.
- Breached companies are not uniformly distributed grographically.
- Some domains never really existed in the first place.
- Spammers, bots, etc. might try signing up with an automated email server.
- Anonymous people might try signing up with a nonexistent email server.
- Many domains were typoed.
- ex.
yaho.com
instead ofyahoo.com
- ex.
- Some data was corrupted, and had to be discarded.
- All breaches in breach compilations are rarely pristine.
Etc. It is not a definitive measure of “this email provider is the most popular” or anything like that - it only tells you that these domains were the most popular as found in this breached data.
HTTPS Download
If you cannot use a torrent client, you can download email_domains_by_popularity.zip from my GetRight server, though this is limited to 500 KB/s per client.