An Introduction to Enterprise Software Localization
In my previous post I discussed some of the key factors to think about when taking your enterprise software business from one country to others. One of the key considerations is product globalization and localisation. Sometimes these terms are used interchangeably but I prefer to use them with the following specific meanings:
- Globalization – ensuring that the product has the underlying capability to support international requirements. For example, to accept and display multiple character sets, different date and time formats, and to recognize time zones.
- Localization – supporting specific needs for a particular country or region. For example, translation into a specific language or dialect, and providing local business functionality.
It is easy to think that this topic can be left to your engineers. But in my experience, it is important that leadership have at least some understanding of the issues and challenges involved as in reality trade-offs will need to be made between having the perfect local product and what is practical and affordable. Very rarely is a company in a position to build a truly global product from the outset – you will normally have made the choice (consciously or unconsciously) to get your product to market as fast as possible in your home territory.
Translation and other Language Considerations
Most internet searches for ‘software localization’ will lead you to the topic of translation. However, translation itself is of no use if the software does not support the characteristics of the target language such as:
- Character sets – the English language uses well under 100 separate characters – 26 each of lower and upper case letters, 10 digits, and a handful of punctuation symbols. However, most other Western European languages require accented characters (ã, ç, é, ï, ñ, etc.) and some Eastern European languages use a completely different alphabet (for example Cyrillic). Travel further East and you encounter languages such as Tamil and Chinese that require non-alphabetical writing systems. A global application needs to be able to input, process, store, and output all of the relevant characters.
- Sorting – it often comes as a surprise to English speakers that even other Western European languages present additional challenges when it comes to sorting text – or collation as it is formally known. The issue is generally around how accented characters are sorted – either with their unaccented version, immediately after, or at the end of the alphabet. Other issues include the handling of diphthongs (is ‘æ’ treated as one character or two?), multiple characters treated as one ( ‘ch’ in traditional Spanish) and whether or not case counts (sort upper and lower case together or one after the other). Depending on your application your users might only ever see text in one language, but if text from multiple languages can be seen simultaneously the problem becomes much harder if you are to produce a good user experience.
- Text Direction – most languages are written right to left (RTL) but some, such as Arabic and Hebrew are left to right (LTR). This impacts the entire UI design, not just the text itself.
Data Formats
The way that dates, times, and numbers are written around the world shows considerable variation and can lead to confusion if not presented in the way that the reader is used to.
- There are many ways of writing a date including: the order of the day, month, and year; whether the month is written out or represented by a number; and the symbol used to separate the parts. In the United States the convention is to use the sequence month-day-year whereas just about everywhere else uses day-month-year. Also we (westerners) tend to assume that everyone uses the Gregorian calendar, but some application may need to use others such as the Islamic, Hebrew, or Solar Hijri calendars.
- The most common variation when specifying time is between the 24 hour clock and the 12 hour one which requires an am/pm (in local language of course) suffix. Time zones must also be considered including the handling of daylight savings time.
- When representing numbers, the decimal point may be represented by a period or a comma, and for ease of reading the digits are usually grouped in threes where the groups could be separated by spaces or commas.
There are many other entities whose formatting varies around the world including telephone numbers, addresses, and even names (for example personal name then family name or the other way around). There are now attempts to define global standards for these (for example the ITU’s E.164 for telephone numbers) but local practice persists and with an increasing focus on user experience, standard practice is more important than the latest technical standard.
Functionality
Depending on the business domain of the application, there will most likely be different processes and calculations in different countries that must be accommodated as part of localisation.
Culture
This is perhaps the trickiest aspect of localizing a product as unlike the rules for character handling, formatting, and functionality, cultural norms are very rarely documented. Examples of where you can run into trouble include the use of icons (symbols can have very different meanings in different parts of the world – think Red Cross / Red Crescent) and even product names – for an interesting list see the article by Mike Fromowitz in Campaign Asia-Pacific.
In conclusion, when planning your overseas expansion don’t just think about translation of your product – there are a host of technical, functional, and cultural challenges as well. Future posts will dig deeper into each of these and provide further examples. I welcome any comments or feedback.