Do internet-based applications accept all valid domain names and e-mail addresses?
“Universal Acceptance” in the context of domain names and e-mail addresses may sound ironic to users of the Internet – which is itself a universal network.
We see domain names and e-mails everywhere. Apps / websites which we use from the time we wake up till we go back to sleep – from exercise, ordering breakfast, working, learning, or recreational activities on YouTube/Spotify/Jio Saavn – all are linked to some domain name. The “account” that links users to the apps, use e-mail in the back-end.
The problem arises because domain names and e-mail addresses are almost entirely in one script – Latin which is used for English. Other scripts are rare, though this was introduced a long time ago.
This is not a reflection of the lack of technology support.
Technology support exists for them in the form of “Internationalized Domain Names” (IDNs) since early 2003, and more formally in a robust manner since 2008.
The only thing that is holding back a wider adoption is the classic “chicken and egg” – that is demand and supply issue.
Before getting into details, let’s see some of the possible IDNs:
• https://ยูเอทดสอบ.ไทย (THAI LANGUAGE)
• https://સાર્વત્રિક-સ્વીકૃતિ-પરીક્ષણ.ભારત (Gujarati Language)
• https://تجربة-القبول-الشامل.موريتانيا (ARABIC)
The above are fully functional domain names in
Here are some examples of Internationalized Email Addresses:
• ອີເມວ-ທົດລອງ@ສາກົນ-ການຍອມຮັບ-ທົດລອງ.ລາວ (LAO)
• ඉ-තැපැල්-පිරික්සුම@විශ්ව-සම්මුති-පිරික්සුම.ලංකා (Sinhala)
• 電子郵件測試@普遍適用測試.台灣 (Chinese Traditional script)
• ईमेल-परीक्षण@सार्वभौमिक-स्वीकृति-परीक्षण.संगठन (Devanagri)
The above email IDs are fully functional.
Emails sent to any of these email IDS will receive an automated reply with words as showed in the image below. Email Address Internationalisation (EAI) is an effort within the UA group.
The limitation for sending such emails is that users will have to use an e-mail service that is able to communicate with the Internationalized Email IDs.
It may help to check with the provider of the e-mail service to include that functionality, as it is central to an inclusive internet – but this can be a tedious task today. To check if the email service provider is maintaining an EAI aware mail-server, there is a wonderful service hosted by UASG named “EAI Check” ----- https://uasg.tech/eai-check/.
Enter the email ID in use and it will give a response whether or not that email service is aware of the “SMTPUTF8” flag, which forms the backbone of EAI aware implementations.
Apart from the language TLDs, the ICANN had also introduced over 1000 new domains for different categories, that went beyond the well known, .COM, and .ORG. For example:
.MUSIC
.TECHNOLOGY
.ISTANBUL
.SKY
The existing internet userbase, which deals with textual aspects of internet services, largely understands English language and Latin script.
There is a linited userbase that accesses the internet in their own language/script – a large part of these are in the Russian language and a few other languages. Most other users think that such a service is un-available.
This further accentuates the problem of adoption of IDNs and new Top-Level domains. A majority of the world’s software systems offering services do not allow non-Latin characters in the domain names and e-mail IDs.
Thus, when users submit these, they are considered as spam and filtered out or rejected. The new top-level domains also get blocked by some string specific restrictions like length validation.
Internet Corporation for Assigned Names and Numbers (ICANN), the guardian of names and numbers in the Internet eco-system, and the administrative body for policy -- has started to take steps towards sensitization of the user-communities and software developing community on this aspect.
This led to the formation of the the “Universal Acceptance Steering Group”, a group comprising of individuals representing more than 120 companies holding stakes in the Internet ecosystem.
In concrete terms, Universal Acceptance of Domain Names and Email Addresses is defined by a set of five operations, which when implemented and followed in a software-system ensures “Universal Acceptance” of all domains and thus is UA compliant.
The five operations are:
1. Accept
2. Validate
3. Store
4. Process
5. Display
1. Accept
In any user-input accepting field, on the user interface of the software, all the characters that can be used as a part of the domain name or e-mail field should be able to be inputted by the user and the software should ensure proper font and formatting of the same so that the inputted text is clearly visible and readable for the user. In addition to the precautions required on the software implementer’s side, there could be some issues on the user side as well which might prevent the Unicode characters in the IDNs from getting rendered correctly. To alleviate those issues, users can refer to the guidelines by the Unicode Consortium here.
2. Validate
There should not be any discriminatory checks that needlessly invalidate the technically viable formats of the domain names and e-mail IDs including those in local languages. To understand the possible reasons for these validation checks, this blog explains the details:
3. Store
When it comes to storing the user-submitted text for later processing, the text should be saved either in its original form as submitted or in a format that is devoid of any lossy conversions which can alter the fundamental nature of the inputted text. Typically, UTF-8 is one of the most common and universally acceptable formats for Unicode storing data.
4. Process
Processing of the text, as per the Unicode norms can happen at two stages throughout its journey.
4.1 Process on input:
Before the processing of the text onto the database, there are various possible Unicode routines, which are applicable for faithful handling of the Unicode text. Those processes, can/should be applied to the text beforehand. Now the ambiguity of the “can/should” is expressed at this point as whether to apply those conversion routines or at what stage to apply them is totally dependent on the business logic, the control of which rightfully lies with the software developer. Typically, these processes involve
1. Conversion/Non-conversion from Unicode to Punycode
If the string is going to get stored in the database column that is not Unicode enabled (for code legacy reasons, otherwise it is definitely recommended to fully Unicode compliant database structure) then the developer might want to convert the Unicode string to Punycode. However, if there are a set of operations expected by the developer on the stored domain names, e.g., some sort of searching and sorting, it is desired that the Unicode form of the domain name be preserved.
2. Conversion as per the Unicode Normalization
Typically, as per the IDNA protocol, the domain name has to be in a “Normalized” form. This helps prevent the dual representation of the same label. A developer, as per the business logic can take an appropriate call if this conversion is needed at this stage.
4.2 Process on output:
This operation on the text often follows after the text is retrieved from the database and is on its way to subsequent processing for a further journey into the user interface. All the conversions/processes applied on the “Process on input” can either be maintained or reversed or newly applied to the text, depending on the requirement and business logic.
5. Display
Finally, after the text is retrieved from the back-end of the system, it should be able to be rendered back into its Unicode form as understandable by the native user of the script/language. The software systems which can comply with all the above 5 major operations while dealing with the IDNs and Internationalized Email Addresses, can be considered to be truly compliant with the notion of “Universal Acceptance”.
Why is this so important at ICANN?
ICANN as a body is entrusted with the responsibility of maintaining and managing the policy in relation to the names and numbers, typically domain names and IP addresses. In addition to this, ICANN has become a global forum for internet policy development discourse. Throughout the discussions at various ICANN meetings, “on-boarding of the next billion” frequently forms the core of the discussions.
Given the current linguistic spread and the demographics of the new internet users, catering to the needs of non-Latin and non-English users becomes vital.
As per the ICANN’s Strategic Plan for fiscal years 2021 - 2025, ICANN’s strategic goal is;
Foster competition, consumer choice, and innovation in the Internet space by increasing awareness and encouraging readiness for Universal Acceptance, IDN implementation, and IPv6.
This is supported by the technical protocol developments on the IDN and Internationalized Email front.
Thus, two of the major aspects of this discourse, the technical feasibility of the multilingual identifiers as well as the availability of affordable internet in otherwise non-connected regions of the world, have been fulfilled.
This leaves only the third side, which is the enablement of the internet-based service’s ability to cater to the truly multilingual internet to be fulfilled. It is towards that goal, that ICANN wants support from the community.
By Akshat Joshi
Akshat Joshi is the founder of ThinkTrans a Startup working in the techno-multi lingual Internet domain
feedbackvnd@cybermedia.co.in