Internal Data Formats

The following topics describe the internal formats of Delphi data types.

The format of an integer-type variable depends on its minimum and maximum bounds.

If both bounds are within the range 128..127 (Shortint), the variable is stored as a signed byte.
If both bounds are within the range 0..255 (Byte), the variable is stored as an unsigned byte.
If both bounds are within the range 32768..32767 (Smallint), the variable is stored as a signed word.
If both bounds are within the range 0..65535 (Word), the variable is stored as an unsigned word.
If both bounds are within the range 2147483648..2147483647 (Longint), the variable is stored as a signed double word.
If both bounds are within the range 0..4294967295 (Longword), the variable is stored as an unsigned double word.
Otherwise, the variable is stored as a signed quadruple word (Int64).

Note: a "word" occupies two bytes.

On the Win32 platform, AnsiChar, or a subrange of a AnsiChar type is stored as an unsigned byte. A WideChar is stored as an unsigned word.

Boolean Types

A Boolean type is stored as a Byte, a ByteBool is stored as a Byte, a WordBool type is stored as a Word, and a LongBool is stored as a Longint.

A Boolean can assume the values 0 (False) and 1 (True). ByteBool, WordBool, and LongBool types can assume the values 0 (False) or nonzero (True).

Enumerated Types

An enumerated type is stored as an unsigned byte if the enumeration has no more than 256 values and the type was declared in the {$Z1} state (the default). If an enumerated type has more than 256 values, or if the type was declared in the {$Z2} state, it is stored as an unsigned word. If an enumerated type is declared in the {$Z4} state, it is stored as an unsigned double-word.

Real Types

The real types store the binary representation of a sign (+ or -), an exponent, and a significand. A real value has the form

+/- significand * 2exponent

where the significand has a single bit to the left of the binary decimal point. (That is, 0 <= significand < 2.)

In the figures that follow, the most significant bit is always on the left and the least significant bit on the right. The numbers at the top indicate the width (in bits) of each field, with the left-most items stored at the highest addresses. For example, for a Real48 value, e is stored in the first byte, f in the following five bytes, and s in the most significant bit of the last byte.

The Real48 type

On the Win32 platform, a 6-byte (48-bit) Real48 number is divided into three fields:

1	39	8
s	f	e

If 0 < e <= 255, the value v of the number is given by

v = (1)^s * 2^(e129) * (1.f)

If e = 0, then v = 0.

The Real48 type can't store denormals, NaNs, and infinities. Denormals become zero when stored in a Real48, while NaNs and infinities produce an overflow error if an attempt is made to store them in a Real48.

The Single type

A 4-byte (32-bit) Single number is divided into three fields

1	8	23
s	e	f

The value v of the number is given by

if 0 < e < 255, then v = (1)^s * 2^(e127) * (1.f)

if e = 0 and f <> 0, then v = (1)^s * 2^(126) * (0.f)

if e = 0 and f = 0, then v = (1)^s * 0

if e = 255 and f = 0, then v = (1)^s * Inf

if e = 255 and f <> 0, then v is a NaN

The Double type

An 8-byte (64-bit) Double number is divided into three fields

1	11	52
s	e	f

The value v of the number is given by

if 0 < e < 2047, then v = (1)^s * 2^(e1023) * (1.f)

if e = 0 and f <> 0, then v = (1)^s * 2^(1022) * (0.f)

if e = 0 and f = 0, then v = (1)^s * 0

if e = 2047 and f = 0, then v = (1)^s * Inf

if e = 2047 and f <> 0, then v is a NaN

The Extended type

A 10-byte (80-bit) Extended number is divided into four fields:

1	15	1	63
s	e	i	f

The value v of the number is given by

if 0 <= e < 32767, then v = (1)^s * 2^(e16383) * (i.f)

if e = 32767 and f = 0, then v = (1)^s * Inf

if e = 32767 and f <> 0, then v is a NaN

The Comp type

An 8-byte (64-bit) Comp number is stored as a signed 64-bit integer.

The Currency type

An 8-byte (64-bit) Currency number is stored as a scaled and signed 64-bit integer with the four least-significant digits implicitly representing four decimal places.

Pointer Types

A Pointer type is stored in 4 bytes as a 32-bit address. The pointer value nil is stored as zero.

Short String Types

A string occupies as many bytes as its maximum length plus one. The first byte contains the current dynamic length of the string, and the following bytes contain the characters of the string.

The length byte and the characters are considered unsigned values. Maximum string length is 255 characters plus a length byte (string[255]).

String Types

A string variable of type UnicodeString or AnsiString occupies four bytes of memory which contain a pointer to a dynamically allocated string. When a string variable is empty (contains a zero-length string), the string pointer is nil and no dynamic memory is associated with the string variable. For a nonempty string value, the string pointer points to a dynamically allocated block of memory that contains the string value in addition to information describing the string. The table below shows the layout of a long-string memory block.

String dynamic memory layout (Win32 only)

Offset	Contents
-12	16-bit codepage of string data
-10	16-bit element size of string data
-8	32-bit reference-count
-4	length in bytes
0..Length - 1	character string of element sized data
LengthElement Size*	NULL character

The NULL character at the end of a string memory block is automatically maintained by the compiler and the built-in string handling routines. This makes it possible to typecast a string directly to a null-terminated string.

For string constants and literals, the compiler generates a memory block with the same layout as a dynamically allocated string, but with a reference count of -1. When a string variable is assigned a string constant, the string pointer is assigned the address of the memory block generated for the string constant. The built-in string handling routines know not to attempt to modify blocks that have a reference count of -1.

Wide String Types

On Win32, a wide string variable occupies four bytes of memory which contain a pointer to a dynamically allocated string. When a wide string variable is empty (contains a zero-length string), the string pointer is nil and no dynamic memory is associated with the string variable. For a nonempty string value, the string pointer points to a dynamically allocated block of memory that contains the string value in addition to a 32-bit length indicator. The table below shows the layout of a wide string memory block on Windows.

Wide string dynamic memory layout (Win32 only)

Offset	Contents
-4	32-bit length indicator (in bytes)
0..Length -1	character string
Length	NULL character

The string length is the number of bytes, so it is twice the number of wide characters contained in the string.

The NULL character at the end of a wide string memory block is automatically maintained by the compiler and the built-in string handling routines. This makes it possible to typecast a wide string directly to a null-terminated string.

Set Types

A set is a bit array where each bit indicates whether an element is in the set or not. The maximum number of elements in a set is 256, so a set never occupies more than 32 bytes. The number of bytes occupied by a particular set is equal to (Max div 8) (Min div 8) + 1, where Max and Min are the upper and lower bounds of the base type of the set. The byte number of a specific element E is (E div 8) (Min div 8) and the bit number within that byte is E mod 8, where E denotes the ordinal value of the element. When possible, the compiler stores sets in CPU registers, but a set always resides in memory if it is larger than the generic Integer type or if the program contains code that takes the address of the set.

Static Array Types

On the Win32 platform, a static array is stored as a contiguous sequence of variables of the component type of the array. The components with the lowest indexes are stored at the lowest memory addresses. A multidimensional array is stored with the rightmost dimension increasing first.

Dynamic Array Types

On the Win32 platform, a dynamic-array variable occupies four bytes of memory which contain a pointer to the dynamically allocated array. When the variable is empty (uninitialized) or holds a zero-length array, the pointer is nil and no dynamic memory is associated with the variable. For a nonempty array, the variable points to a dynamically allocated block of memory that contains the array in addition to a 32-bit length indicator and a 32-bit reference count. The table below shows the layout of a dynamic-array memory block.

Dynamic array memory layout (Win32 only)

Offset	Contents
-8	32-bit reference-count
-4	32-bit length indicator (number of elements)
0..Length * (size of element) -1	array elements

Record Types

When a record type is declared in the {$A+} state (the default), and when the declaration does not include a packed modifier, the type is an unpacked record type, and the fields of the record are aligned for efficient access by the CPU. The alignment is controlled by the type of each field and by whether fields are declared together. Every data type has an inherent alignment, which is automatically computed by the compiler. The alignment can be 1, 2, 4, or 8, and represents the byte boundary that a value of the type must be stored on to provide the most efficient access. The table below lists the alignments for all data types.

Type alignment masks (Win32 only)

Type	Alignment
Ordinal types	size of the type (1, 2, 4, or 8)
Real types	2 for Real48, 4 for Single, 8 for Double and Extended
Short string types	1
Array types	same as the element type of the array.
Record types	the largest alignment of the fields in the record
Set types	size of the type if 1, 2, or 4, otherwise 1
All other types	determined by the $A directive.

To ensure proper alignment of the fields in an unpacked record type, the compiler inserts an unused byte before fields with an alignment of 2, and up to three unused bytes before fields with an alignment of 4, if required. Finally, the compiler rounds the total size of the record upward to the byte boundary specified by the largest alignment of any of the fields.

If two fields share a common type specification, they are packed even if the declaration does not include the packed modifier and the record type is not declared in the {$A-} state. Thus, for example, given the following declaration

Copy Code

type
  TMyRecord = record
    A, B: Extended;  
    C: Extended;
  end;

A and B are packed (aligned on byte boundaries) because they share the same type specification. The compiler pads the structure with unused bytes to ensure that C appears on a quadword boundary.

When a record type is declared in the {$A-} state, or when the declaration includes the packed modifier, the fields of the record are not aligned, but are instead assigned consecutive offsets. The total size of such a packed record is simply the size of all the fields. Because data alignment can change, it's a good idea to pack any record structure that you intend to write to disk or pass in memory to another module compiled using a different version of the compiler.

File Types

On the Win32 platform, file types are represented as records. Typed files and untyped files occupy 592 bytes, which are laid out as follows:

Copy Code

type
TFileRec = packed record
  Handle: Integer;
  Mode: word;
  Flags: word;
    case Byte of
        0: (RecSize: Cardinal);
        1: (BufSize: Cardinal;
            BufPos: Cardinal;
            BufEnd: Cardinal;
            BufPtr: PChar;
            OpenFunc: Pointer;
            InOutFunc: Pointer;
            FlushFunc: Pointer;
            CloseFunc: Pointer;
            UserData: array[1..32] of Byte;
            Name: array[0..259] of Char; );
end;

Text files occupy 848 bytes, which are laid out as follows:

Copy Code

type
        TTextBuf = array[0..127] of Char;
        TTextRec = packed record
                Handle: Integer;
                Mode: word;
                Flags: word;
                BufSize: Cardinal;
                BufPos: Cardinal;
                BufEnd: Cardinal;
                BufPtr: PChar;      
                OpenFunc: Pointer;
                InOutFunc: Pointer;
                FlushFunc: Pointer;
                CloseFunc: Pointer;
                UserData: array[1..32] of Byte;
                Name: array[0..259] of Char;
                Buffer: TTextBuf;
end;

Handle contains the file's handle (when the file is open).

The Mode field can assume one of the values

Copy Code

const
    fmClosed = $D7B0;
    fmInput= $D7B1;
    fmOutput = $D7B2;
    fmInOut= $D7B3;

where fmClosed indicates that the file is closed, fmInput and fmOutput indicate a text file that has been reset (fmInput) or rewritten (fmOutput), fmInOut indicates a typed or untyped file that has been reset or rewritten. Any other value indicates that the file variable is not assigned (and hence not initialized).

The UserData field is available for user-written routines to store data in.

Name contains the file name, which is a sequence of characters terminated by a null character (#0).

For typed files and untyped files, RecSize contains the record length in bytes, and the Private field is unused but reserved.

For text files, BufPtr is a pointer to a buffer of BufSize bytes, BufPos is the index of the next character in the buffer to read or write, and BufEnd is a count of valid characters in the buffer. OpenFunc, InOutFunc, FlushFunc, and CloseFunc are pointers to the I/O routines that control the file; see Device functions. Flags determines the line break style as follows:

bit 0 clear	LF line breaks
bit 0 set	CRLF line breaks

All other Flags bits are reserved for future use.

Procedural Types

On the Win32 platform, a procedure pointer is stored as a 32-bit pointer to the entry point of a procedure or function. A method pointer is stored as a 32-bit pointer to the entry point of a method, followed by a 32-bit pointer to an object.

Class Types

On the Win32 platform, a class-type value is stored as a 32-bit pointer to an instance of the class, which is called an object. The internal data format of an object resembles that of a record. The object's fields are stored in order of declaration as a sequence of contiguous variables. Fields are always aligned, corresponding to an unpacked record type. Any fields inherited from an ancestor class are stored before the new fields defined in the descendant class.

The first 4-byte field of every object is a pointer to the virtual method table (VMT) of the class. There is exactly one VMT per class (not one per object); distinct class types, no matter how similar, never share a VMT. VMT's are built automatically by the compiler, and are never directly manipulated by a program. Pointers to VMT's, which are automatically stored by constructor methods in the objects they create, are also never directly manipulated by a program.

The layout of a VMT is shown in the following table. At positive offsets, a VMT consists of a list of 32-bit method pointersone per user-defined virtual method in the class typein order of declaration. Each slot contains the address of the corresponding virtual method's entry point. This layout is compatible with a C++ v-table and with COM. At negative offsets, a VMT contains a number of fields that are internal to Delphi's implementation. Applications should use the methods defined in TObject to query this information, since the layout is likely to change in future implementations of the Delphi language.

Virtual method table layout (Win32 Only)

Offset	Type	Description
-76	Pointer	pointer to virtual method table (or nil)
-72	Pointer	pointer to interface table (or nil)
-68	Pointer	pointer to Automation information table (or nil)
-64	Pointer	pointer to instance initialization table (or nil)
-60	Pointer	pointer to type information table (or nil)
-56	Pointer	pointer to field definition table (or nil)
-52	Pointer	pointer to method definition table (or nil)
-48	Pointer	pointer to dynamic method table (or nil)
-44	Pointer	pointer to short string containing class name
-40	Cardinal	instance size in bytes
-36	Pointer	pointer to a pointer to ancestor class (or nil)
-32	Pointer	pointer to entry point of SafecallException method (or nil)
-28	Pointer	entry point of AfterConstruction method
-24	Pointer	entry point of BeforeDestruction method
-20	Pointer	entry point of Dispatch method
-16	Pointer	entry point of DefaultHandler method
-12	Pointer	entry point of NewInstance method
-8	Pointer	entry point of FreeInstance method
-4	Pointer	entry point of Destroy destructor
0	Pointer	entry point of first user-defined virtual method
4	Pointer	entry point of second user-defined virtual method

Class Reference Types

On the Win32 platform, a class-reference value is stored as a 32-bit pointer to the virtual method table (VMT) of a class.

Variant Types

The following discussion of the internal layout of variant types applies to the Win32 platform only. Variants rely on boxing and unboxing of data into an object wrapper, as well as Delphi helper classes to implement the variant-related RTL functions.

On the Win32 platform, a variant is stored as a 16-byte record that contains a type code and a value (or a reference to a value) of the type given by the code. The System and Variants units define constants and types for variants.

The TVarData type represents the internal structure of a Variant variable (on Windows, this is identical to the Variant type used by COM and the Win32 API). The TVarData type can be used in typecasts of Variant variables to access the internal structure of a variable. The TVarData record contains the following fields:

VType contains the type code of the variant in the lower twelve bits (the bits defined by the varTypeMask constant). In addition, the varArray bit may be set to indicate that the variant is an array, and the varByRef bit may be set to indicate that the variant contains a reference as opposed to a value.
The Reserved1, Reserved2, and Reserved3 fields are unused.

The contents of the remaining eight bytes of a TVarData record depend on the VType field as follows:

If neither the varArray nor the varByRef bits are set, the variant contains a value of the given type.
If the varArray bit is set, the variant contains a pointer to a TVarArray structure that defines an array. The type of each array element is given by the varTypeMask bits in the VType field.
If the varByRef bit is set, the variant contains a reference to a value of the type given by the varTypeMask and varArray bits in the VType field.

The varString type code is private. Variants containing a varString value should never be passed to a non-Delphi function. On Win32, Delphi's Automation support automatically converts varString variants to varOleStr variants before passing them as parameters to external functions.