Estimated Reading Time: 5 minutes
An internationalized domain name (IDN) homograph attack is a method of deceiving computer users about the remote computer they’re communicating with. It exploits the fact that many characters are homographs, meaning they look alike. Homographs allow a malicious party to create an IDN that appears very similar to an established domain, which can then be used to lure users to the new website. Alphabets that have the largest number of homographs with the Latin alphabet are generally the most useful for perpetrating an IDN homograph attack since most top-level domains (TLDs) use the Latin alphabet. A variety of defenses are currently available for protecting computer users from these attacks.
Overview
An IDN homograph attack is similar to another type of domain name spoofing known as typosquatting. Both techniques attempt to deceive users by using a new domain name that’s similar to an established name, although they exploit different types of similarities. Typosquatting uses a new domain name that’s spelled differently from the established name, but uses the same character set. A homograph attack typically uses a domain name that contains characters from other character sets, which requires the user to click on a hyperlink of the new name. This type of attack rarely works with a manual entry of the domain name since a user is unlikely to unintentionally enter a homograph.
Some domain names can be used for both typosquatting and homograph spoofing. For example, a spoof that uses a domain name containing an uppercase “O” instead of the numeral “0” would be both types of attack. The success of this type of spoof is highly dependent on the typeface the computer uses, as these two characters are physically identical in some typefaces.
Internationalized Domain Names
An IDN is a domain name that’s displayed in at least one language-specific alphabet. They’re stored as ASCII strings in the Domain Name System (DNS), allowing IDNs to use the full Unicode character set in a backward-compatible manner. However, this approach also increases the effectiveness of homograph attacks since it expanded the character set from the relatively small number of characters in a single alphabet to the many thousands of characters used by the world’s written languages. An attacker can thus register a new domain name that looks like the domain name of a legitimate website by substituting homographs. The new domain name can then be used to direct users to the spoof site, where information such as account passwords can be collected.
Character Sets
Unicode includes many different character sets, some of which have similar-looking characters. For example, the Cyrillic, Greek and Latin alphabets all use the character “O,” although these characters are assigned different codes in Unicode. This physical similarity creates the opportunity for successful IDN homograph attacks. The character sets with the greatest value in homograph attacks include ASCII, Cyrillic, Greek and Armenian.
ASCII
The most common homographs in ASCII include an uppercase “O” and the numeral “0”, and a lowercase “L” and the numeral “1”. Depending on the typeface, an “L” and “1” can also be homographs for an uppercase “I.” Some ASCII homographs are combinations of letters, such as a lowercase “R” and “N” for a lowercase “M”.
Note how similar “rnicrosoft.com”
appears to “microsoft.com”
, depending on the font. This similarity is especially striking in the Tahoma font, which is the default for the URL address bar in Windows XP.
Cyrillic
Cyrillic is by the most popular alphabet for IDN homograph attacks, primarily because it contains many homographs in the Latin alphabet. Lowercase homographs have the greatest value for these attacks, since most users enter URLs in lowercase. Cyrillic has seven (7) lowercase characters that are identical or virtually identical to characters in the Latin alphabet. The characters a, c, e, o, p, x and y exist in both alphabets and are physically indistinguishable in most fonts.
Uppercase homographs between the Cyrillic and Latin alphabets include the letters A, B, C, E, H, I, J, K, M, O, P, S, T and X. Furthermore, the Cyrillic characters З, Ч and б are very similar to the Latin numerals 3, 4 and 6, depending on the font.
Greek
The Greek alphabet is less common for homograph attacks since only the lowercase “o” and “v” are identical with their Latin counterparts. However, Greek also has 10 other lowercase characters that are close matches for the Latin lowercase letters e, i, k, n, p, t, u, w, x and y. Furthermore, the lowercase Greek “α” also looks like a Latin lowercase “a” in an italic font. Fourteen uppercase homographs exist between the Greek and Latin alphabets, with the letters Α, Β, Ε, Η, Ι, Κ, Μ, Ν, Ο, Ρ, Τ, Χ, Υ and Ζ being identical.
Armenian
The Armenian alphabet also has several homographs with the Latin alphabet that make it useful in homograph attacks. For example, the lowercase letters o, n and u and uppercase letters S and L are physically identical in these two alphabets for most modern fonts. The primary disadvantage of using Armenian is that many standard fonts don’t include Armenian characters, whereas most standard fonts do include Cyrillic and Greek characters.
Defenses
The defenses to homograph attacks may be classified into client-side and server side techniques.
Client Side
The general approach to defending against homograph attacks on the client side is to ensure that web browsers don’t support Internationalizing Domain Names in Applications (IDNA) at all or allow users to disable such support. This typically means that a web browsers displays IDNs in Punycode, which is a method of representing Unicode characters with the smaller ASCII subset. A less common solution is to deny access to IDNA sites.
Check out this Punycode converter for more information.
Server Side
Server-side defenses to homograph attacks primarily rely on policies implemented by the Internet Corporation for Assigned Names and Numbers (ICANN). These policies generally prohibit internationalized TLDs from containing non-Latin characters that could cause it to resemble an existing TLD that uses Latin characters. ICANN also encourages the use of longer TLDs, making them more difficult to resemble existing Latin TLDs.
Conclusions
The best way to protect yourself and your organization from internationalized domain name attacks is through continuous training for all employees—as well as implementing robust web filtering designed for cybersecurity.
Remember, always check URLs in emails BEFORE you click on them. You should also always ensure you have connected to the right remote location (i.e. check the URL once you’ve arrived at the target site).