RAD Studio (Common)
ContentsIndex
PreviousUpNext
Assembly Expressions (Win32 Only)

The built-in assembler evaluates all expressions as 32-bit integer values. It doesn't support floating-point and string values, except string constants. The inline assembler is available only on the Win32 Delphi compiler. 

Expressions are built from expression elements and operators, and each expression has an associated expression class and expression type. This topic covers the following material:

  • Differences between Delphi and Assembler Expressions
  • Expression Elements
  • Expression Classes
  • Expression Types
  • Expression Operators

The most important difference between Delphi expressions and built-in assembler expressions is that assembler expressions must resolve to a constant value. In other words, it must resolve to a value that can be computed at compile time. For example, given the declarations

const
 X = 10;
 Y = 20;
var
 Z: Integer;

the following is a valid statement.

asm
 MOV        Z,X+Y
end;

Because both X and Y are constants, the expression X + Y is a convenient way of writing the constant 30, and the resulting instruction simply moves of the value 30 into the variable Z. But if X and Y are variables

var
 X, Y: Integer;

the built-in assembler cannot compute the value of X + Y at compile time. In this case, to move the sum of X and Y into Z you would use

asm
 MOV            EAX,X
 ADD            EAX,Y
 MOV            Z,EAX
end;

In a Delphi expression, a variable reference denotes the contents of the variable. But in an assembler expression, a variable reference denotes the address of the variable. In Delphi the expression X + 4 (where X is a variable) means the contents of X plus 4, while to the built-in assembler it means the contents of the word at the address four bytes higher than the address of X. So, even though you are allowed to write

asm
 MOV            EAX,X+4
end;

this code doesn't load the value of X plus 4 into AX; instead, it loads the value of a word stored four bytes beyond X. The correct way to add 4 to the contents of X is

asm
 MOV            EAX,X
    ADD         EAX,4
end;

The elements of an expression are constants, registers, and symbols.

Numeric Constants

Numeric constants must be integers, and their values must be between 2,147,483,648 and 4,294,967,295. 

By default, numeric constants use decimal notation, but the built-in assembler also supports binary, octal, and hexadecimal. Binary notation is selected by writing a B after the number, octal notation by writing an O after the number, and hexadecimal notation by writing an H after the number or a $ before the number. 

Numeric constants must start with one of the digits 0 through 9 or the $ character. When you write a hexadecimal constant using the H suffix, an extra zero is required in front of the number if the first significant digit is one of the digits A through F. For example, 0BAD4H and $BAD4 are hexadecimal constants, but BAD4H is an identifier because it starts with a letter.

String Constants

String constants must be enclosed in single or double quotation marks. Two consecutive quotation marks of the same type as the enclosing quotation marks count as only one character. Here are some examples of string constants:

 'Z'
 'Delphi'
 'Linux'
 "That's all folks"
 '"That''s all folks," he said.'
 '100'
 '"'
 "'" 

String constants of any length are allowed in DB directives, and cause allocation of a sequence of bytes containing the ASCII values of the characters in the string. In all other cases, a string constant can be no longer than four characters and denotes a numeric value which can participate in an expression. The numeric value of a string constant is calculated as

Ord(Ch1) + Ord(Ch2) shl 8 + Ord(Ch3) shl 16 + Ord(Ch4) shl 24

where Ch1 is the rightmost (last) character and Ch4 is the leftmost (first) character. If the string is shorter than four characters, the leftmost characters are assumed to be zero. The following table shows string constants and their numeric values.  

String examples and their values  

String  
Value  
'a'  
00000061H  
'ba'  
00006261H  
'cba'  
00636261H  
'dcba'  
64636261H  
'a '  
00006120H  
' a'  
20202061H  
'a' * 2  
000000E2H  
'a'-'A'  
00000020H  
not 'a'  
FFFFFF9EH  

 

Registers

The following reserved symbols denote CPU registers in the inline assembler:  

CPU registers  

32-bit general purpose  
EAX EBX ECX EDX  
32-bit pointer or index  
ESP EBP ESI EDI  
16-bit general purpose  
AX BX CX DX  
16-bit pointer or index  
SP BP SI DI  
8-bit low registers  
AL BL CL DL  
16-bit segment registers  
CS DS SS ES  
 
 
32-bit segment registers  
FS GS  
8-bit high registers  
AH BH CH DH  
Coprocessor register stack  
ST  

When an operand consists solely of a register name, it is called a register operand. All registers can be used as register operands, and some registers can be used in other contexts. 

The base registers (BX and BP) and the index registers (SI and DI) can be written within square brackets to indicate indexing. Valid base/index register combinations are [BX], [BP], [SI], [DI], [BX+SI], [BX+DI], [BP+SI], and [BP+DI]. You can also index with all the 32-bit registersfor example, [EAX+ECX], [ESP], and [ESP+EAX+5]. 

The segment registers (ES, CS, SS, DS, FS, and GS) are supported, but segments are normally not useful in 32-bit applications. 

The symbol ST denotes the topmost register on the 8087 floating-point register stack. Each of the eight floating-point registers can be referred to using ST(X), where X is a constant between 0 and 7 indicating the distance from the top of the register stack.

Symbols

The built-in assembler allows you to access almost all Delphi identifiers in assembly language expressions, including constants, types, variables, procedures, and functions. In addition, the built-in assembler implements the special symbol @Result, which corresponds to the Result variable within the body of a function. For example, the function

function Sum(X, Y: Integer): Integer;
begin 
        Result := X + Y; 
end; 

could be written in assembly language as

function Sum(X, Y: Integer): Integer; stdcall; 
begin
 asm
         MOV        EAX,X 
            ADD     EAX,Y
            MOV     @Result,EAX
 end; 
end; 

The following symbols cannot be used in asm statements:

  • Standard procedures and functions (for example, WriteLn and Chr).
  • String, floating-point, and set constants (except when loading registers).
  • Labels that aren't declared in the current block.
  • The @Result symbol outside of functions.
The following table summarizes the kinds of symbol that can be used in asm statements.  

Symbols recognized by the built-in assembler  

Symbol  
Value  
Class  
Type  
Label  
Address of label  
Memory reference  
Size of type  
Constant  
Value of constant  
Immediate value  
0  
Type  
0  
Memory reference  
Size of type  
Field  
Offset of field  
Memory  
Size of type  
Variable  
Address of variable or address of a pointer to the variable  
Memory reference  
Size of type  
Procedure  
Address of procedure  
Memory reference  
Size of type  
Function  
Address of function  
Memory reference  
Size of type  
Unit  
0  
Immediate value  
0  
@Result  
Result variable offset  
Memory reference  
Size of type  

With optimizations disabled, local variables (variables declared in procedures and functions) are always allocated on the stack and accessed relative to EBP, and the value of a local variable symbol is its signed offset from EBP. The assembler automatically adds [EBP] in references to local variables. For example, given the declaration

var Count: Integer;

within a function or procedure, the instruction

MOV       EAX,Count

assembles into MOV EAX,[EBP4]

The built-in assembler treats var parameters as a 32-bit pointers, and the size of a var parameter is always 4. The syntax for accessing a var parameter is different from that for accessing a value parameter. To access the contents of a var parameter, you must first load the 32-bit pointer and then access the location it points to. For example,

function Sum(var X, Y: Integer): Integer; stdcall;
    begin
        asm
                MOV             EAX,X
                MOV             EAX,[EAX]
                MOV             EDX,Y
                ADD             EAX,[EDX]
                MOV             @Result,EAX
     end;
    end;

Identifiers can be qualified within asm statements. For example, given the declarations

 type
     TPoint = record
            X, Y: Integer;
     end;
     TRect = record
            A, B: TPoint;
     end;
    var
     P: TPoint;
     R: TRect;

the following constructions can be used in an asm statement to access fields.

MOV         EAX,P.X
MOV         EDX,P.Y
MOV         ECX,R.A.X
MOV         EBX,R.B.Y

A type identifier can be used to construct variables on the fly. Each of the following instructions generates the same machine code, which loads the contents of [EDX] into EAX.

MOV         EAX,(TRect PTR [EDX]).B.X
MOV         EAX,TRect([EDX]).B.X
MOV         EAX,TRect[EDX].B.X
MOV         EAX,[EDX].TRect.B.X

The built-in assembler divides expressions into three classes: registers, memory references, and immediate values. 

An expression that consists solely of a register name is a register expression. Examples of register expressions are AX, CL, DI, and ES. Used as operands, register expressions direct the assembler to generate instructions that operate on the CPU registers. 

Expressions that denote memory locations are memory references. Delphi's labels, variables, typed constants, procedures, and functions belong to this category. 

Expressions that aren't registers and aren't associated with memory locations are immediate values. This group includes Delphi's untyped constants and type identifiers. 

Immediate values and memory references cause different code to be generated when used as operands. For example,

const
 Start = 10;
var
 Count: Integer;
    .
    .
    .
asm
 MOV            EAX,Start                                       { MOV EAX,xxxx }
 MOV            EBX,Count                                       { MOV EBX,[xxxx] }
 MOV            ECX,[Start]                             { MOV ECX,[xxxx] }
 MOV            EDX,OFFSET Count            { MOV EDX,xxxx }
end;

Because Start is an immediate value, the first MOV is assembled into a move immediate instruction. The second MOV, however, is translated into a move memory instruction, as Count is a memory reference. In the third MOV, the brackets convert Start into a memory reference (in this case, the word at offset 10 in the data segment). In the fourth MOV, the OFFSET operator converts Count into an immediate value (the offset of Count in the data segment). 

The brackets and OFFSET operator complement each other. The following asm statement produces identical machine code to the first two lines of the previous asm statement.

asm
 MOV            EAX,OFFSET [Start]
 MOV            EBX,[OFFSET Count]
end;

Memory references and immediate values are further classified as either relocatable or absolute. Relocation is the process by which the linker assigns absolute addresses to symbols. A relocatable expression denotes a value that requires relocation at link time, while an absolute expression denotes a value that requires no such relocation. Typically, expressions that refer to labels, variables, procedures, or functions are relocatable, since the final address of these symbols is unknown at compile time. Expressions that operate solely on constants are absolute. 

The built-in assembler allows you to carry out any operation on an absolute value, but it restricts operations on relocatable values to addition and subtraction of constants.

Every built-in assembler expression has a type, or more correctly a size, because the assembler regards the type of an expression simply as the size of its memory location. For example, the type of an Integer variable is four, because it occupies 4 bytes. The built-in assembler performs type checking whenever possible, so in the instructions

var
 QuitFlag: Boolean;
 OutBufPtr: Word;
 .
    .
    .
asm
 MOV            AL,QuitFlag
 MOV            BX,OutBufPtr
end;

the assembler checks that the size of QuitFlag is one (a byte), and that the size of OutBufPtr is two (a word). The instruction

MOV           DL,OutBufPtr

produces an error because DL is a byte-sized register and OutBufPtr is a word. The type of a memory reference can be changed through a typecast; these are correct ways of writing the previous instruction:

MOV         DL,BYTE PTR OutBufPtr
MOV         DL,Byte(OutBufPtr)
MOV         DL,OutBufPtr.Byte

These MOV instructions all refer to the first (least significant) byte of the OutBufPtr variable. 

In some cases, a memory reference is untyped. One example is an immediate value (Buffer) enclosed in square brackets:

procedure Example(var Buffer);
     asm
        MOV AL,     [Buffer]
        MOV CX,     [Buffer]
        MOV EDX, [Buffer]
     end;

The built-in assembler permits these instructions, because the expression [Buffer] has no type. [Buffer] means "the contents of the location indicated by Buffer," and the type can be determined from the first operand (byte for AL, word for CX, and double-word for EDX). 

In cases where the type can't be determined from another operand, the built-in assembler requires an explicit typecast. For example,

INC     BYTE PTR [ECX]
IMUL    WORD PTR [EDX]

The following table summarizes the predefined type symbols that the built-in assembler provides in addition to any currently declared Delphi types.  

Predefined type symbols  

Symbol  
Type  
BYTE  
1  
WORD  
2  
DWORD  
4  
QWORD  
8  
TBYTE  
10  

The built-in assembler provides a variety of operators. Precedence rules are different from that of the Delphi language; for example, in an asm statement, AND has lower precedence than the addition and subtraction operators. The following table lists the built-in assembler's expression operators in decreasing order of precedence.  

Precedence of built-in assembler expression operators  

Operators  
Remarks  
Precedence  
&  
 
highest  
(... ), [... ],., HIGH, LOW  
 
 
+, -  
unary + and -  
 
:  
 
 
OFFSET, TYPE, PTR, *, /, MOD, SHL, SHR, +, -  
binary + and -  
 
NOT, AND, OR, XOR  
 
lowest  

The following table defines the built-in assembler's expression operators.  

Definitions of built-in assembler expression operators  

Operator  
Description  
&  
Identifier override. The identifier immediately following the ampersand is treated as a user-defined symbol, even if the spelling is the same as a built-in assembler reserved symbol.  
(... )  
Subexpression. Expressions within parentheses are evaluated completely prior to being treated as a single expression element. Another expression can precede the expression within the parentheses; the result in this case is the sum of the values of the two expressions, with the type of the first expression.  
[... ]  
Memory reference. The expression within brackets is evaluated completely prior to being treated as a single expression element. Another expression can precede the expression within the brackets; the result in this case is the sum of the values of the two expressions, with the type of the first expression. The result is always a memory reference.  
.  
Structure member selector. The result is the sum of the expression before the period and the expression after the period, with the type of the expression after the period. Symbols belonging to the scope identified by the expression before the period can be accessed in the expression after the period.  
HIGH  
Returns the high-order 8 bits of the word-sized expression following the operator. The expression must be an absolute immediate value.  
LOW  
Returns the low-order 8 bits of the word-sized expression following the operator. The expression must be an absolute immediate value.  
+  
Unary plus. Returns the expression following the plus with no changes. The expression must be an absolute immediate value.  
-  
Unary minus. Returns the negated value of the expression following the minus. The expression must be an absolute immediate value.  
+  
Addition. The expressions can be immediate values or memory references, but only one of the expressions can be a relocatable value. If one of the expressions is a relocatable value, the result is also a relocatable value. If either of the expressions is a memory reference, the result is also a memory reference.  
-  
Subtraction. The first expression can have any class, but the second expression must be an absolute immediate value. The result has the same class as the first expression.  
:  
Segment override. Instructs the assembler that the expression after the colon belongs to the segment given by the segment register name (CS, DS, SS, FS, GS, or ES) before the colon. The result is a memory reference with the value of the expression after the colon. When a segment override is used in an instruction operand, the instruction is prefixed with an appropriate segment-override prefix instruction to ensure that the indicated segment is selected.  
OFFSET  
Returns the offset part (double word) of the expression following the operator. The result is an immediate value.  
TYPE  
Returns the type (size in bytes) of the expression following the operator. The type of an immediate value is 0.  
PTR  
Typecast operator. The result is a memory reference with the value of the expression following the operator and the type of the expression in front of the operator.  
*  
Multiplication. Both expressions must be absolute immediate values, and the result is an absolute immediate value.  
/  
Integer division. Both expressions must be absolute immediate values, and the result is an absolute immediate value.  
MOD  
Remainder after integer division. Both expressions must be absolute immediate values, and the result is an absolute immediate value.  
SHL  
Logical shift left. Both expressions must be absolute immediate values, and the result is an absolute immediate value.  
SHR  
Logical shift right. Both expressions must be absolute immediate values, and the result is an absolute immediate value.  
NOT  
Bitwise negation. The expression must be an absolute immediate value, and the result is an absolute immediate value.  
AND  
Bitwise AND. Both expressions must be absolute immediate values, and the result is an absolute immediate value.  
OR  
Bitwise OR. Both expressions must be absolute immediate values, and the result is an absolute immediate value.  
XOR  
Bitwise exclusive OR. Both expressions must be absolute immediate values, and the result is an absolute immediate value.  
Copyright(C) 2008 CodeGear(TM). All Rights Reserved.
What do you think about this topic? Send feedback!