RAD Studio (Common)
ContentsIndex
PreviousUpNext
String Types

This topic describes the string data types available in the Delphi language. The following types are covered:

  • Short strings.
  • Long strings.
  • Wide (Unicode) strings.

A string represents a sequence of characters. Delphi supports the following predefined string types.  

String types  

Type  
Maximum length  
Memory required  
Used for  
ShortString  
255 characters  
2 to 256 bytes  
backward compatibility  
AnsiString  
~2^31 characters  
4 bytes to 2GB  
8-bit (ANSI) characters, DBCS ANSI, MBCS ANSI, etc.  
WideString  
~2^30 characters  
4 bytes to 2GB  
Unicode characters; multi-user servers and multi-language applications  

On the Win32 platform, AnsiString, sometimes called the long string, is the preferred type for most purposes. WideString is the preferred string type on the .NET platform. 

String types can be mixed in assignments and expressions; the compiler automatically performs required conversions. But strings passed by reference to a function or procedure (as var and out parameters) must be of the appropriate type. Strings can be explicitly cast to a different string type. 

The reserved word string functions like a generic type identifier. For example,

var S: string;

creates a variable S that holds a string. On the Win32 platform, the compiler interprets string (when it appears without a bracketed number after it) as AnsiString. On the .NET platform, the string type maps to the String class. You can use single byte character strings on the .NET platform, but you must explicitly declare them to be of type AnsiString.  

On the Win32 platform, you can use the {$H-} directive to turn string into ShortString. The {$H-} directive is deprecated on the .NET platform. 

The standard function Length returns the number of characters in a string. The SetLength procedure adjusts the length of a string. 

Comparison of strings is defined by the ordering of the characters in corresponding positions. Between strings of unequal length, each character in the longer string without a corresponding character in the shorter string takes on a greater-than value. For example, 'AB' is greater than 'A'; that is, 'AB' > 'A' returns True. Zero-length strings hold the lowest values. 

You can index a string variable just as you would an array. If S is a string variable and i an integer expression, S[i] represents the ith character - or, strictly speaking, the ith byte in S. For a ShortString or AnsiString, S[i] is of type AnsiChar; for a WideString, S[i] is of type WideChar. For single-byte (Western) locales, MyString[2] := 'A'; assigns the value A to the second character of MyString. The following code uses the standard AnsiUpperCase function to convert MyString to uppercase.

var I: Integer;
begin
   I := Length(MyString);
   while I > 0 do
    begin
       MyString[I] := AnsiUpperCase(MyString[I]);
       I := I - 1;
    end;
end;

Be careful indexing strings in this way, since overwriting the end of a string can cause access violations. Also, avoid passing long-string indexes as var parameters, because this results in inefficient code. 

You can assign the value of a string constant - or any other expression that returns a string - to a variable. The length of the string changes dynamically when the assignment is made. Examples:

MyString := 'Hello world!';
MyString := 'Hello' + 'world';
MyString := MyString + '!';
MyString := ' '; { space }
MyString := '';  { empty string }

A ShortString is 0 to 255 characters long. While the length of a ShortString can change dynamically, its memory is a statically allocated 256 bytes; the first byte stores the length of the string, and the remaining 255 bytes are available for characters. If S is a ShortString variable, Ord(S[0]), like Length(S), returns the length of S; assigning a value to S[0], like calling SetLength, changes the length of S. ShortString is maintained for backward compatibility only. 

The Delphi language supports short-string types - in effect, subtypes of ShortString - whose maximum length is anywhere from 0 to 255 characters. These are denoted by a bracketed numeral appended to the reserved word string. For example,

var MyString: string[100];

creates a variable called MyString whose maximum length is 100 characters. This is equivalent to the declarations

type CString = string[100];
var MyString: CString;

Variables declared in this way allocate only as much memory as the type requires - that is, the specified maximum length plus one byte. In our example, MyString uses 101 bytes, as compared to 256 bytes for a variable of the predefined ShortString type. 

When you assign a value to a short-string variable, the string is truncated if it exceeds the maximum length for the type. 

The standard functions High and Low operate on short-string type identifiers and variables. High returns the maximum length of the short-string type, while Low returns zero.

AnsiString, also called a long string, represents a dynamically allocated string whose maximum length is limited only by available memory. 

A long-string variable is a pointer occupying four bytes of memory. When the variable is empty - that is, when it contains a zero-length stringthe pointer is nil and the string uses no additional storage. When the variable is nonempty, it points a dynamically allocated block of memory that contains the string value. The eight bytes before the location contain a 32-bit length indicator and a 32-bit reference count. This memory is allocated on the heap, but its management is entirely automatic and requires no user code. 

Because long-string variables are pointers, two or more of them can reference the same value without consuming additional memory. The compiler exploits this to conserve resources and execute assignments faster. Whenever a long-string variable is destroyed or assigned a new value, the reference count of the old string (the variable's previous value) is decremented and the reference count of the new value (if there is one) is incremented; if the reference count of a string reaches zero, its memory is deallocated. This process is called reference-counting. When indexing is used to change the value of a single character in a string, a copy of the string is made if - but only if - its reference count is greater than one. This is called copy-on-write semantics.

The WideString type represents a dynamically allocated string of 16-bit Unicode characters. In most respects it is similar to AnsiString. On Win32, WideString is compatible with the COM BSTR type.

Note: Under Win32, WideString values are not reference-counted.
The Win32 platform supports single-byte and multibyte character sets as well as Unicode. With a single-byte character set (SBCS), each byte in a string represents one character. 

In a multibyte character set (MBCS), some characters are represented by one byte and others by more than one byte. The first byte of a multibyte character is called the lead byte. In general, the lower 128 characters of a multibyte character set map to the 7-bit ASCII characters, and any byte whose ordinal value is greater than 127 is the lead byte of a multibyte character. The null value (#0) is always a single-byte character. Multibyte character sets - especially double-byte character sets (DBCS) - are widely used for Asian languages. 

In the Unicode character set, each character is represented by two bytes. Thus a Unicode string is a sequence not of individual bytes but of two-byte words. Unicode characters and strings are also called wide characters and wide character strings. The first 256 Unicode characters map to the ANSI character set. The Windows operating system supports Unicode (UCS-2). 

The Delphi language supports single-byte and multibyte characters and strings through the Char, PChar, AnsiChar, PAnsiChar, and AnsiString types. Indexing of multibyte strings is not reliable, since S[i] represents the ith byte (not necessarily the ith character) in S. However, the standard string-handling functions have multibyte-enabled counterparts that also implement locale-specific ordering for characters. (Names of multibyte functions usually start with Ansi-. For example, the multibyte version of StrPos is AnsiStrPos.) Multibyte character support is operating-system dependent and based on the current locale. 

Delphi supports Unicode characters and strings through the WideChar, PWideChar, and WideString types.

Many programming languages, including C and C++, lack a dedicated string data type. These languages, and environments that are built with them, rely on null-terminated strings. A null-terminated string is a zero-based array of characters that ends with NUL (#0); since the array has no length indicator, the first NUL character marks the end of the string. You can use Delphi constructions and special routines in the SysUtils unit (see Standard routines and I/O) to handle null-terminated strings when you need to share data with systems that use them. 

For example, the following type declarations could be used to store null-terminated strings.

type
  TIdentifier = array[0..15] of Char;
  TFileName = array[0..259] of Char;
  TMemoText = array[0..1023] of WideChar;

With extended syntax enabled ({$X+}), you can assign a string constant to a statically allocated zero-based character array. (Dynamic arrays won't work for this purpose.) If you initialize an array constant with a string that is shorter than the declared length of the array, the remaining characters are set to #0.

Using Pointers, Arrays, and String Constants

To manipulate null-terminated strings, it is often necessary to use pointers. (See Pointers and pointer types.) String constants are assignment-compatible with the PChar and PWideChar types, which represent pointers to null-terminated arrays of Char and WideChar values. For example,

var P: PChar;
  ...                    
P := 'Hello world!'

points P to an area of memory that contains a null-terminated copy of 'Hello world!' This is equivalent to

const TempString: array[0..12] of Char = 'Hello world!';
var P: PChar;
   ...
P := @TempString[0];

You can also pass string constants to any function that takes value or const parameters of type PChar or PWideChar - for example StrUpper('Hello world!'). As with assignments to a PChar, the compiler generates a null-terminated copy of the string and gives the function a pointer to that copy. Finally, you can initialize PChar or PWideChar constants with string literals, alone or in a structured type. Examples:

const
  Message: PChar = 'Program terminated';
  Prompt: PChar = 'Enter values: ';
  Digits: array[0..9] of PChar = ('Zero', 'One', 'Two', 'Three', 'Four', 'Five', 'Six', 'Seven', 'Eight', 'Nine');

Zero-based character arrays are compatible with PChar and PWideChar. When you use a character array in place of a pointer value, the compiler converts the array to a pointer constant whose value corresponds to the address of the first element of the array. For example,

var
  MyArray: array[0..32] of Char;
  MyPointer: PChar;
begin
  MyArray := 'Hello';
  MyPointer := MyArray;
  SomeProcedure(MyArray);
  SomeProcedure(MyPointer);
end;

This code calls SomeProcedure twice with the same value. 

A character pointer can be indexed as if it were an array. In the previous example, MyPointer[0] returns H. The index specifies an offset added to the pointer before it is dereferenced. (For PWideChar variables, the index is automatically multiplied by two.) Thus, if P is a character pointer, P[0] is equivalent to P^ and specifies the first character in the array, P[1] specifies the second character in the array, and so forth; P[-1] specifies the 'character' immediately to the left of P[0]. The compiler performs no range checking on these indexes. 

The StrUpper function illustrates the use of pointer indexing to iterate through a null-terminated string:

function StrUpper(Dest, Source: PChar; MaxLen: Integer): PChar;
var
  I: Integer;
begin
  I := 0;
  while (I < MaxLen) and (Source[I] <> #0) do
  begin
    Dest[I] := UpCase(Source[I]);
    Inc(I);
  end;
  Dest[I] := #0;
  Result := Dest;
end;

 

Mixing Delphi Strings and Null-Terminated Strings

You can mix long strings (AnsiString values) and null-terminated strings (PChar values) in expressions and assignments, and you can pass PChar values to functions or procedures that take long-string parameters. The assignment S := P, where S is a string variable and P is a PChar expression, copies a null-terminated string into a long string. 

In a binary operation, if one operand is a long string and the other a PChar, the PChar operand is converted to a long string. 

You can cast a PChar value as a long string. This is useful when you want to perform a string operation on two PChar values. For example,

S := string(P1) + string(P2);

You can also cast a long string as a null-terminated string. The following rules apply.

  • If S is a long-string expression, PChar(S) casts S as a null-terminated string; it returns a pointer to the first character in S. For example, if Str1 and Str2 are long strings, you could call the Win32 API MessageBox function like this: MessageBox(0, PChar(Str1), PChar(Str2), MB_OK);
  • You can also use Pointer(S) to cast a long string to an untyped pointer. But if S is empty, the typecast returns nil.
  • PChar(S) always returns a pointer to a memory block; if S is empty, a pointer to #0 is returned.
  • When you cast a long-string variable to a pointer, the pointer remains valid until the variable is assigned a new value or goes out of scope. If you cast any other long-string expression to a pointer, the pointer is valid only within the statement where the typecast is performed.
  • When you cast a long-string expression to a pointer, the pointer should usually be considered read-only. You can safely use the pointer to modify the long string only when all of the following conditions are satisfied.
  • The expression cast is a long-string variable.
  • The string is not empty.
  • The string is unique - that is, has a reference count of one. To guarantee that the string is unique, call the SetLength, SetString, or UniqueString procedure.
  • The string has not been modified since the typecast was made.
  • The characters modified are all within the string. Be careful not to use an out-of-range index on the pointer.
The same rules apply when mixing WideString values with PWideChar values.

Copyright(C) 2008 CodeGear(TM). All Rights Reserved.
What do you think about this topic? Send feedback!