RAD Studio VCL Reference
SysUtils.CharLength Function

Returns number of bytes used by character.

function CharLength(const S: AnsiString; Index: Integer): Integer; overload;
function CharLength(const S: UnicodeString; Index: Integer): Integer; overload;
int CharLength(const AnsiString S, int Index);
int CharLength(const UnicodeString S, int Index);

Call CharLength to determine the size in bytes of the character starting at Index in S. If the character does not start at Index, this function returns the size of the remainder of the character, not the full character length.

Note: Index is an element index into S, not a byte or character index.
If S is an AnsiString and the system is not using a multi-byte character system (MBCS), CharLength always returns 1. 

The following example illustrates CharLength's operation.

  SJISString = type AnsiString(932);
  A: SJISString;
  L: Integer;
  A := 'A' +
       'B' +
       #$82#$A0 +  // Japanese Hiragana 'A'
       #$82#$A2 +  // Japanese Hiragana 'I'
       #$82#$A4 +  // Japanese Hiragana 'U'
       'C' +

  L := CharLength(A, 1);     //returns 1 ('A')
  L := CharLength(A, 2);     //returns 1 ('B')
  L := CharLength(A, 3);     //returns 2
  L := CharLength(A, 4);     //returns 1

In this example, when the index is 1 or 2, it points to the beginning of a single byte character, so the function returns 1. When the index is 3, it points to the beginning of a two byte character, and the function returns 2. When the index is 4, it points to the second half of a two byte character and returns 1. Note that for this example, the element size is 1. Some characters require two elements, and some only need one element. 

In general, CharLength is only useful when you start indexing at the beginning of a string and note where multibyte characters start as the index increases. 

This function also works for Unicode characters:

  U: UnicodeString;
  L: Integer;
  U := 'abc';
  L := SysUtils.CharLength(U,1); //returns 2
  L := SysUtils.CharLength(U,2); //returns 2

  U := #$20BB7; //surrogate pair
  L := SysUtils.CharLength(U,1); //returns 4   
  L := SysUtils.CharLength(U,2); //returns 2

Note that the element size is 2 in this example, and the surrogate pair character consists of two elements.  

When the string is 'abc', each character is a single two byte element, so the function returns 2. String literals are Unicode by default. 

For the surrogate pair, when the index is 1, it points to the first element in the character, so the function returns 4. When the index is 2, it points to the second element in the character and returns 2. 


Copyright(C) 2009 Embarcadero Technologies, Inc. All Rights Reserved.
What do you think about this topic? Send feedback!