A really interesting problem that came up today:
- How to sort a list of product in Chinese?
- The Chinese language doesn’t use an alphabet so how can we sort?
The Kangxi dictionary, compiled during the Qing dynasty, mentioned 46,964 characters so that doesn’t work
There have been numerous attempts at getting this working but only 2 are really in use.
The first one is the use of radicals and the number of strokes. The second one is a search based on sound using phonetics.
Sorting by Radicals
The Wikipedia gives the following definition of a radical:
A radical (from Latin radix, meaning “root”) is a basic identifiable component of every Chinese character. (This includes not only Chinese Hanzi, but also the Japanese Kanji, Korean Hanja and Vietnamese Chữ nôm and Chữ nho.) In languages that use Chinese characters, a radical is called 部首 (Pinyin: bùshǒu; Japanese bushu and Korean busu), literally meaning “section header”. Radicals are important to the organisation and use of Chinese dictionaries, Japanese Kanji dictionaries, and Korean Hanja dictionaries.
Wikipedia
The first one, sorting by radical, uses the 214 basic forms compiled in 1615 in China.
This brings it to a more manageable number instead if the 46,964 that we know from the Kangxi dictionary.
The sort order of each “word” is based on the radical from which it is derived. Every character or word is based on a basic radical that has a specific order. Try the application from Jim Breen and you will notice that very complex characters sometimes map to a 2 stroke radical or a set of radicals. Then the order of the radicals is based on the number of strokes used to form the character.
I am sure a lot more rules need to be considered but this is a fascinating problem when it comes to databases. A solution might be to map each character to another table with all the radicals and do a lookup for each character you want to sort.
Multi-lingual lists
Now that you know about Chinese lists, consider a product database with one-to-one translated product names for English and Chinese. Sorting these lists will result in 2 very different sort orders cause the logic behind the sort is in no way related to each other.
In this case the problem is really a cultural problem opposed to the initial problem (sorting Chinese characters). Because the sorting order is based on cultural differences and ideas about language and characters. If you know these 2 languages and you sort the English list, the location of the Chinese translation in the list is completeley different. Going back and forth in such a list will give different results based on the language.
Sorting Chinese lists is a technical problem. A problem that shows that IT services and systems are not created with these multi-cultural information structures in mind.
I am sure I missed several things in this short article, so if you spot an error, leave a comment and I will study some more.