Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We did a large-scale study of this phenomenon recently: https://www.cs.bu.edu/faculty/crovella/paper-archive/wung-if...

Across a broad sample of typo domains of major sites, most registered domains aren’t actually reachable, implying they are registered for defensive, legitimate, or unrelated purposes. Interestingly, the typo space on major sites is actually very sparsely registered (2% at edit distance 1), meaning that typosquatting may actually be underexploited.



>Interestingly, the typo space on major sites is actually very sparsely registered (2% at edit distance 1), meaning that typosquatting may actually be underexploited.

Anecdotally, the autosuggestions and improved browsing history recommendations may mean this is way less lucrative than it used to be.

Also, anyone doing search like behaviour in their address bar is far more likely to see a knowledge panel style reply for prominent websites vs the 10 blue link format of historical search engine results, which may have included the nefarious domains.

I'd leap to say that because of this, users find their intended domain by using natural language far more than they used to.


I would argue that it is 100% searching in the address bar. Mobile has trained people to do that, and search results are usually solid enough to take you to the right place.


Mobile? Haven’t all desktop browsers switched to having the address bar do search unless you type a fully qualified domain, since Chrome came out c. 2009? Before that there were 2 fields at the top of browser windows, but Firefox and Safari ditched those pretty quickly after Chrome.


Yeah, I'd lean towards a high % also- it would take some time to prove it.

Also, homograph attacks are likely much less of a thing for the above reasons.


A possible explanation why typos for major sites are sparsely registered could be that the domain industry has put a lot of focus the last decade on addressing malicious registrations, and many registrars that focus on the market segment of large companies sell products that monitor for malicious registrations with legal response in case one pops up. It is also seems that bulk registrars has gotten better filters to reduce malicious registrations, which is a service some security companies offer to registrars. In theory it should be quite more difficult today for a malicious actor to go to a major registrar and buy an obvious trademark infringing domain for a major site.

Domain/trademark monitoring also directly compete with defensive registrations. Often it is a question if you want to pay the lawyers/monitoring service, a large number of registration/renewal fees, or both.


My guess is also that not all typos are equal. Should have a stricter edit version for 1-keystroke-away filtered edits (that is delete, swap or add 1 key away / replace one key away) instead of pure Levenshtein. Like Fqcebook is a more likely typo than Fjcebook but they are both edit-1


Someone should make a qwertyshtein() function.


If I understand correctly from the paper what qualifies as an edit distance of 1 is pure Levenshtein distance-1 right?

Just curious because while the edit-1 space can be fairly big, I’d assume all edits have very different probabilities. So the squatted domains probably skew to a higher probability edit. By that I mean mostly keyboard edit typos, eg on a phone: the “cwt” typo is more likely than “cpt” for “cat” because of an and w keyboard proximity. Wonder what the squatting rate is when you filter for edit within one key stroke for example (only really change the add and replace types of edits, not delete or swap)


> Interestingly, the typo space on major sites is actually very sparsely registered (2% at edit distance 1)

It seems to me that "edit distance 1" still describes some very implausible typos.


Yeah corner and comer is an edit distance of 2 but perhaps more lucrative than corner and corker, as a bad example.


I saw rnicrosoft in use the other day, somewhere.


Yes, Levenshtein in that case give too big an exploration space. A keyboard edit distance would probably work better. Delete and swap are still 1 but replace and add should be within say 1-key at most


"... meaning that typosquatting may actually be underexploited."

Missing from the paper is an examination of web user behaviour

Over time, so-called "direct navigation" where the domain name, e.g., example.com, was typed into the browser address bar, has declined. By the time Google terminated "Adsense for domains" in 2012 IMO it had managed to systematically subsume most of the traffic and associated revenue from the typosquatting/domain parking racket

https://web.archive.org/web/20250320184725if_/https://domain...

With the introduction of the so-called "omnibar" or "omnibox" in Firefox^1 and Chrome, typographical errors in domain names are submitted as "searches" to a company that sells ad services. For example, Safari, Firefox, Chrome all sending search traffic to Google, LLC. From the DoJ antitrust litigation we know that Google has been paying ridiculously large sums of money to various companies for this traffic

1. Firefox originally called this the "awesome bar"

https://web.archive.org/web/20250927011424if_/https://www.cn...

Not to mention increasingly common user practice of direct navigation to a search engine webpage, e.g., google.com, then searching for the desired website, e.g., example.com

As everyone knows, one company, in some cases through acquisitions and/or anticompetitive conduct, came to control 1. search, 2. "the web browser", 3. online advertising services on the open web, 4. operating systems (mobile, "chromebook"), ...

If parked domains only get traffic from "direct navigation",^2 then it stands to reason that such traffic has declined as it has been increasingly captured by advertising-sponsored "default browsers" and, ultimately, Google. IMO, it makes sense that domain parking as a means of delivering ads and generating revenue would give way to these domains becoming unregistered or registered to malware distributers or the like

What are the registration histories for the unregistered edit distance 1 typosquatting domains. Consider the number that are "currently unregistered" versus "never before registered"

2. Perhaps the registrants are using other ways to send traffic to these domains




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: