In the 1980s work was begun to develop a single, 16-bit (= 2 byte) multilingual character encoding system that can represent nearly all characters used in the major languages of the world. The resulting standard was called Unicode.
Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.
These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption.
Unicode is changing all that!
Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. The Unicode Standard has been adopted by such industry leaders as Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others. Unicode is required by modern standards such as XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement ISO/IEC 10646. It is supported in many operating systems, all modern browsers, and many other products. The emergence of the Unicode Standard, and the availability of tools supporting it, are among the most significant recent global software technology trends.
Incorporating Unicode into client-server or multi-tiered applications and websites offers significant cost savings over the use of legacy character sets. Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering. It allows data to be transported through many different systems without corruption.
There is a good general introduction to Unicode that is not overly technical on the SIL International web site.
Do the Exercise "Exploring Unicode ".
the unicodeText
property of LiveCode fields.
Mac
see http://www.alanwood.net/unicode/fonts_macosx.html for an exhaustive list.
Lucida Grande (Latin, Cyrillic, Greek...)
Fang Song (Chinese)
Times New Roman
(Can use Windows Unicode fonts)
Windows
see http://www.alanwood.net/unicode/fonts.html for an exhaustive list.
Arial
Lucida family (Latin, Cyrillic, Greek...)
Tahoma
MS Hei, MS Song (Chinese)
Times New Roman
From About chunk expressions:
Important! Characters in chunk expressions are assumed to be single-byte
characters. To successfully use chunk expressions with Unicode (double-byte)
text, you must treat each double-byte character as a set of two single-byte
characters. For example, to get the numeric value of the third Unicode character
in a field, use statements like the following:
set the useUnicode to true
get charToNum(char 5 to 6 of field "Chinese
Text")
How to enter or display Unicode text in a field.
You display double-byte text in its correct language by setting its textFont
property to a Unicode font. You can either put the text into the field and set the textFont
in a handler or the message box, or manually enter the text after using the operating systems built-in text entry tools to choose a language.
For example, to display double-byte Japanese characters that are on line 12 of a field, use a statement like the following:
set the textFont of line 12 of field 1 to "Osaka,Japanese"
When you manually enter text in a language that does not use the Roman alphabet, using the operating systems tools, LiveCode automatically sets the textFont of the text you enter to the appropriate font for the language you have chosen.
How to find out whether text in a field is Unicode
You find out whether text in a field is Unicode text by examining its textFont
property. The textFont
of Unicode text consists of the font name, a comma, and either Unicode or the language the text is in. The following example statement checks whether line 3 of a field is Unicode:
if the effective textFont of line 3 of field 1
contains comma then answer "Its Unicode!"
Note: Characters in chunk expressions are assumed to be single-byte characters. To check a Unicode characters textFont
using a chunk expression, treat it as two single-byte characters. For example, to check the fifth character in a field consisting of double-byte characters, use the expression the effective textFont of char 9 to 10 of field 1.
How to convert between Unicode (UTF-16) and UTF-8 text.
LiveCode displays non-Roman-alphabet languages using Unicode (UTF-16). You use the uniDecode
and uniEncode
functions to convert between UTF-16 and UTF-8.
The following statement converts a variables contents from UTF-8 to UTF-16, and places the resulting Unicode text in a field:
put uniEncode(myVariable,"UTF8") into field "My Field"
How to convert between Unicode and ASCII text.
You use the uniEncode
and uniDecode
functions to convert text from double-byte (Unicode) to single-byte ASCII, or vice versa.
To convert a string of single-byte characters to Unicode text, use a statement like the following:
put uniEncode(field "Text") into myUnicodeText
To convert a string of double-byte characters to single-byte, use a statement like the following:
put uniDecode(the unicodeText of field "Japanese Text")
into convertedText
How to import a Unicode text file.
You use the unicodeText property to import a file that contains Unicode text. To put the text from a Unicode file into a field, use a statement like the following in a handler or the message box:
set the unicodeText of field "Text" to URL "binfile:my.txt"
If the file contains text in multiple languages, LiveCode automatically sets the textFont
of language runs to the appropriate Unicode font.
Important! This method works only if the file you are importing contains Unicode (UTF-16) data. It will not work for other encoding methods such as UTF-8 or Shift-JIS.
useUnicode
property: Specifies whether the charToNum
and numToChar
functions assume a character is double-byte.
unicodeText
property: Specifies the text in a field, represented as Unicode (double-byte characters).
uniDecode
function: Converts a string from Unicode to single-byte text.
uniEncode
function: Converts a string from single-byte text to Unicode.
fontLanguage
function: Returns the language associated with a Unicode font.
Stack "unicodeTrials.rev" Examples of how to use unicode in LiveCode, including referring to chunks, reading unicode text from files, converting between UTF-8 and UTF-16, etc.