RAD Studio
ContentsIndex
PreviousUpNext
Multibyte Character Sets

The ideographic character sets used in Asia cannot use the simple 1:1 mapping between characters in the language and the one byte (8-bit) char type. These languages have too many characters to be represented using the single-byte char. Instead, a multibyte string can contain one or more bytes per character. AnsiStrings can contain a mix of single-byte and multibyte characters. 

The lead byte of every multibyte character code is taken from a reserved range that depends on the specific character set. The second and subsequent bytes can sometimes be the same as the character code for a separate one-byte character, or it can fall in the range reserved for the first byte of multibyte characters. Thus, the only way to tell whether a particular byte in a string represents a single character or is part of a multibyte character is to read the string, starting at the beginning, parsing it into two or more byte characters when a lead byte from the reserved range is encountered. 

When writing code for Asian locales, you must be sure to handle all string manipulation using functions that are enabled to parse strings into multibyte characters. 

Delphi provides you with many of these runtime library functions, as listed in the following table:  

Runtime library functions  

AdjustLineBreaks  
AnsiStrLower  
ExtractFileDir  
AnsiCompareFileName  
AnsiStrPos  
ExtractFileExt  
AnsiExtractQuotedStr  
AnsiStrRScan  
ExtractFileName  
AnsiLastChar  
AnsiStrScan  
ExtractFilePath  
AnsiLowerCase  
AnsiStrUpper  
ExtractRelativePath  
AnsiLowerCaseFileName  
AnsiUpperCase  
FileSearch  
AnsiPos  
AnsiUpperCaseFileName  
IsDelimiter  
AnsiQuotedStr  
ByteToCharIndex  
IsPathDelimiter  
AnsiStrComp  
ByteToCharLen  
LastDelimiter  
AnsiStrIComp  
ByteType  
StrByteType  
AnsiStrLastChar  
ChangeFileExt  
StringReplace  
AnsiStrLComp  
CharToByteIndex  
WrapText  
AnsiStrLIComp  
CharToByteLen  
 

Remember that the length of the strings in bytes does not necessarily correspond to the length of the string in characters. Be careful not to truncate strings by cutting a multibyte character in half. Do not pass characters as a parameter to a function or procedure, since the size of a character can't be known up front. Instead, always pass a pointer to a character or a string.

Copyright(C) 2008 CodeGear(TM). All Rights Reserved.
What do you think about this topic? Send feedback!