The ideographic character sets used in Asia cannot use the simple 1:1 mapping between characters in the language and the one byte (8-bit) char type. These languages have too many characters to be represented using the single-byte char. Instead, a multibyte string can contain one or more bytes per character. AnsiStrings can contain a mix of single-byte and multibyte characters.
The lead byte of every multibyte character code is taken from a reserved range that depends on the specific character set. The second and subsequent bytes can sometimes be the same as the character code for a separate one-byte character, or it can fall in the range reserved for the first byte of multibyte characters. Thus, the only way to tell whether a particular byte in a string represents a single character or is part of a multibyte character is to read the string, starting at the beginning, parsing it into two or more byte characters when a lead byte from the reserved range is encountered.
When writing code for Asian locales, you must be sure to handle all string manipulation using functions that are enabled to parse strings into multibyte characters.
Delphi provides you with many of these runtime library functions, as listed in the following table:
Runtime library functions
AdjustLineBreaks |
AnsiStrLower |
ExtractFileDir |
AnsiCompareFileName |
AnsiStrPos |
ExtractFileExt |
AnsiExtractQuotedStr |
AnsiStrRScan |
ExtractFileName |
AnsiLastChar |
AnsiStrScan |
ExtractFilePath |
AnsiLowerCase |
AnsiStrUpper |
ExtractRelativePath |
AnsiLowerCaseFileName |
AnsiUpperCase |
FileSearch |
AnsiPos |
AnsiUpperCaseFileName |
IsDelimiter |
AnsiQuotedStr |
ByteToCharIndex |
IsPathDelimiter |
AnsiStrComp |
ByteToCharLen |
LastDelimiter |
AnsiStrIComp |
ByteType |
StrByteType |
AnsiStrLastChar |
ChangeFileExt |
StringReplace |
AnsiStrLComp |
CharToByteIndex |
WrapText |
AnsiStrLIComp |
CharToByteLen |
|
Remember that the length of the strings in bytes does not necessarily correspond to the length of the string in characters. Be careful not to truncate strings by cutting a multibyte character in half. Do not pass characters as a parameter to a function or procedure, since the size of a character can't be known up front. Instead, always pass a pointer to a character or a string.
Copyright(C) 2008 CodeGear(TM). All Rights Reserved.
|
What do you think about this topic? Send feedback!
|