Experimental Suspicious Domain Feed
We have had a "newly registered domain" feed for a few years. This feed pulls data from ICANN's centralized zone data service (https://czds.icann.org) and TLS certificate transparency logs.
The ICANN CZDS is a good start, but it only offers data from top-level domains collaborating with ICANN. Missing are in particular country-level domains. Country-level zone files can be hard to come by, so we use TLS certificate transparency logs as a "cheap" alternative. Pretty much all domain registrars will, by default, create a "parked" website, and with that, they will make a certificate. Even if they do not, any halfway self-respecting phishing site will use TLS and register a certificate with a public certificate authority at one point. The TLS certificate transparency logs also help capture older domains.
Each day, we capture around 250,000 new domains using this system. But of course, we want to know which domains are used for malicious purposes. However, as the sample below shows, there are a lot of "odd" domain names.
domainname |
---|
jgcinversiones.com |
h20manager.net |
1sbrfreebet.com |
stability.now |
mdskj.top |
internationalone19.com |
clistrict196.org |
agenteinsider.com |
720airpano.com |
dhofp.tax |
bos228btts.lol |
japansocialmarketing.org |
mummyandimedia.com |
1dyzfd.buzz |
oollm.shop |
snapztrailk.store |
perumice.com |
nrnmy.sbs |
commaexperts.com |
softfragments.com |
So I searched for some commonly used criteria to identify "bad" domain names, and found these:
- A domain name is very short or very long
- The entropy of the domain name (is it just random characters?)
- Does it contain a lot of numbers or hyphens?
- Is it an international domain name, and if so, is it valid? Does it mix different scripts (=languages)?
- Does it contain keywords like "bank" or "login" that are often used with phishing sites, or brand names like "Apple" or "Google"?
We have now added a score to each domain name that can be used to rank them based on these criteria. You can find a daily report here, and the score was added to our "recentdomain" API feed. This is experimental, and the exact algorithm we use for the score will change over time.
We used to have an "old" supicous domain feed that was mostly based on correlating a few third party feeds, but over time these feeds went away or became commercial and we could no longer use them.
Feedback is very welcome.
---
Johannes B. Ullrich, Ph.D. , Dean of Research, SANS.edu
Twitter|
Comments