All Articles

Interesting addition to unicode

A good friend of mine, and fellow trekkie, showed me something very interesting in the unicode man page. (Type man unicode on a unix system, or you can get it here)

UCS  contains the characters required to represent practically all known languages. This includes not only the Latin, Greek, Cyrillic, Hebrew, Arabic, Armenian, and Georgian scripts, but also also Chinese, Japanese and Korean Han ideographs as well as scripts  such  as Hiragana, Katakana, Hangul, Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Thai, Lao, Khmer, Bopomofo, Tibetan, Runic, Ethiopic, Canadian Syllabics, Cherokee, Mongolian, Ogham, Myanmar, Sinhala, Thaana, Yi, and others. For scripts not yet covered,  research on how to best encode them for computer usage is still going on and they will be added eventually. This might eventually include not only Hieroglyphs and various historic Indo-European languages, but even some  selected artistic scripts such as Tengwar, Cirth, and Klingon. UCS also covers a large number of graphical, typographical, mathematical and scientific symbols, including those provided by TeX, Postscript, APL, MS-DOS, MS-Windows, Macintosh, OCR fonts, as well as many word processing and publishing systems, and more are being added.

When Klignon get’s added to unicode, we should all take a Romulan Ale (or maybe more to the point, a barrel of bloodwine) to celebrate!

Qapla’!

Published Aug 21, 2008

I am a computer scientist specializing in building machine learning powered products. I’m currently a machine learning developer at Local Logic.