RAD Studio (Common)
|
The built-in assembler evaluates all expressions as 32-bit integer values. It doesn't support floating-point and string values, except string constants. The inline assembler is available only on the Win32 Delphi compiler.
Expressions are built from expression elements and operators, and each expression has an associated expression class and expression type. This topic covers the following material:
The most important difference between Delphi expressions and built-in assembler expressions is that assembler expressions must resolve to a constant value. In other words, it must resolve to a value that can be computed at compile time. For example, given the declarations
const X = 10; Y = 20; var Z: Integer;
the following is a valid statement.
asm MOV Z,X+Y end;
Because both X and Y are constants, the expression X + Y is a convenient way of writing the constant 30, and the resulting instruction simply moves of the value 30 into the variable Z. But if X and Y are variables
var X, Y: Integer;
the built-in assembler cannot compute the value of X + Y at compile time. In this case, to move the sum of X and Y into Z you would use
asm MOV EAX,X ADD EAX,Y MOV Z,EAX end;
In a Delphi expression, a variable reference denotes the contents of the variable. But in an assembler expression, a variable reference denotes the address of the variable. In Delphi the expression X + 4 (where X is a variable) means the contents of X plus 4, while to the built-in assembler it means the contents of the word at the address four bytes higher than the address of X. So, even though you are allowed to write
asm MOV EAX,X+4 end;
this code doesn't load the value of X plus 4 into AX; instead, it loads the value of a word stored four bytes beyond X. The correct way to add 4 to the contents of X is
asm MOV EAX,X ADD EAX,4 end;
The elements of an expression are constants, registers, and symbols.
Numeric constants must be integers, and their values must be between 2,147,483,648 and 4,294,967,295.
By default, numeric constants use decimal notation, but the built-in assembler also supports binary, octal, and hexadecimal. Binary notation is selected by writing a B after the number, octal notation by writing an O after the number, and hexadecimal notation by writing an H after the number or a $ before the number.
Numeric constants must start with one of the digits 0 through 9 or the $ character. When you write a hexadecimal constant using the H suffix, an extra zero is required in front of the number if the first significant digit is one of the digits A through F. For example, 0BAD4H and $BAD4 are hexadecimal constants, but BAD4H is an identifier because it starts with a letter.
String constants must be enclosed in single or double quotation marks. Two consecutive quotation marks of the same type as the enclosing quotation marks count as only one character. Here are some examples of string constants:
'Z' 'Delphi' 'Linux' "That's all folks" '"That''s all folks," he said.' '100' '"' "'"
String constants of any length are allowed in DB directives, and cause allocation of a sequence of bytes containing the ASCII values of the characters in the string. In all other cases, a string constant can be no longer than four characters and denotes a numeric value which can participate in an expression. The numeric value of a string constant is calculated as
Ord(Ch1) + Ord(Ch2) shl 8 + Ord(Ch3) shl 16 + Ord(Ch4) shl 24
where Ch1 is the rightmost (last) character and Ch4 is the leftmost (first) character. If the string is shorter than four characters, the leftmost characters are assumed to be zero. The following table shows string constants and their numeric values.
String examples and their values
String |
Value |
'a' |
00000061H |
'ba' |
00006261H |
'cba' |
00636261H |
'dcba' |
64636261H |
'a ' |
00006120H |
' a' |
20202061H |
'a' * 2 |
000000E2H |
'a'-'A' |
00000020H |
not 'a' |
FFFFFF9EH |
The following reserved symbols denote CPU registers in the inline assembler:
CPU registers
32-bit general purpose |
EAX EBX ECX EDX |
32-bit pointer or index |
ESP EBP ESI EDI |
16-bit general purpose |
AX BX CX DX |
16-bit pointer or index |
SP BP SI DI |
8-bit low registers |
AL BL CL DL |
16-bit segment registers |
CS DS SS ES |
|
|
32-bit segment registers |
FS GS |
8-bit high registers |
AH BH CH DH |
Coprocessor register stack |
ST |
When an operand consists solely of a register name, it is called a register operand. All registers can be used as register operands, and some registers can be used in other contexts.
The base registers (BX and BP) and the index registers (SI and DI) can be written within square brackets to indicate indexing. Valid base/index register combinations are [BX], [BP], [SI], [DI], [BX+SI], [BX+DI], [BP+SI], and [BP+DI]. You can also index with all the 32-bit registersfor example, [EAX+ECX], [ESP], and [ESP+EAX+5].
The segment registers (ES, CS, SS, DS, FS, and GS) are supported, but segments are normally not useful in 32-bit applications.
The symbol ST denotes the topmost register on the 8087 floating-point register stack. Each of the eight floating-point registers can be referred to using ST(X), where X is a constant between 0 and 7 indicating the distance from the top of the register stack.
The built-in assembler allows you to access almost all Delphi identifiers in assembly language expressions, including constants, types, variables, procedures, and functions. In addition, the built-in assembler implements the special symbol @Result, which corresponds to the Result variable within the body of a function. For example, the function
function Sum(X, Y: Integer): Integer; begin Result := X + Y; end;
could be written in assembly language as
function Sum(X, Y: Integer): Integer; stdcall; begin asm MOV EAX,X ADD EAX,Y MOV @Result,EAX end; end;
The following symbols cannot be used in asm statements:
Symbols recognized by the built-in assembler
Symbol |
Value |
Class |
Type |
Label |
Address of label |
Memory reference |
Size of type |
Constant |
Value of constant |
Immediate value |
0 |
Type |
0 |
Memory reference |
Size of type |
Field |
Offset of field |
Memory |
Size of type |
Variable |
Address of variable or address of a pointer to the variable |
Memory reference |
Size of type |
Procedure |
Address of procedure |
Memory reference |
Size of type |
Function |
Address of function |
Memory reference |
Size of type |
Unit |
0 |
Immediate value |
0 |
@Result |
Result variable offset |
Memory reference |
Size of type |
With optimizations disabled, local variables (variables declared in procedures and functions) are always allocated on the stack and accessed relative to EBP, and the value of a local variable symbol is its signed offset from EBP. The assembler automatically adds [EBP] in references to local variables. For example, given the declaration
var Count: Integer;
within a function or procedure, the instruction
MOV EAX,Count
assembles into MOV EAX,[EBP4].
The built-in assembler treats var parameters as a 32-bit pointers, and the size of a var parameter is always 4. The syntax for accessing a var parameter is different from that for accessing a value parameter. To access the contents of a var parameter, you must first load the 32-bit pointer and then access the location it points to. For example,
function Sum(var X, Y: Integer): Integer; stdcall; begin asm MOV EAX,X MOV EAX,[EAX] MOV EDX,Y ADD EAX,[EDX] MOV @Result,EAX end; end;
Identifiers can be qualified within asm statements. For example, given the declarations
type TPoint = record X, Y: Integer; end; TRect = record A, B: TPoint; end; var P: TPoint; R: TRect;
the following constructions can be used in an asm statement to access fields.
MOV EAX,P.X MOV EDX,P.Y MOV ECX,R.A.X MOV EBX,R.B.Y
A type identifier can be used to construct variables on the fly. Each of the following instructions generates the same machine code, which loads the contents of [EDX] into EAX.
MOV EAX,(TRect PTR [EDX]).B.X MOV EAX,TRect([EDX]).B.X MOV EAX,TRect[EDX].B.X MOV EAX,[EDX].TRect.B.X
The built-in assembler divides expressions into three classes: registers, memory references, and immediate values.
An expression that consists solely of a register name is a register expression. Examples of register expressions are AX, CL, DI, and ES. Used as operands, register expressions direct the assembler to generate instructions that operate on the CPU registers.
Expressions that denote memory locations are memory references. Delphi's labels, variables, typed constants, procedures, and functions belong to this category.
Expressions that aren't registers and aren't associated with memory locations are immediate values. This group includes Delphi's untyped constants and type identifiers.
Immediate values and memory references cause different code to be generated when used as operands. For example,
const Start = 10; var Count: Integer; . . . asm MOV EAX,Start { MOV EAX,xxxx } MOV EBX,Count { MOV EBX,[xxxx] } MOV ECX,[Start] { MOV ECX,[xxxx] } MOV EDX,OFFSET Count { MOV EDX,xxxx } end;
Because Start is an immediate value, the first MOV is assembled into a move immediate instruction. The second MOV, however, is translated into a move memory instruction, as Count is a memory reference. In the third MOV, the brackets convert Start into a memory reference (in this case, the word at offset 10 in the data segment). In the fourth MOV, the OFFSET operator converts Count into an immediate value (the offset of Count in the data segment).
The brackets and OFFSET operator complement each other. The following asm statement produces identical machine code to the first two lines of the previous asm statement.
asm MOV EAX,OFFSET [Start] MOV EBX,[OFFSET Count] end;
Memory references and immediate values are further classified as either relocatable or absolute. Relocation is the process by which the linker assigns absolute addresses to symbols. A relocatable expression denotes a value that requires relocation at link time, while an absolute expression denotes a value that requires no such relocation. Typically, expressions that refer to labels, variables, procedures, or functions are relocatable, since the final address of these symbols is unknown at compile time. Expressions that operate solely on constants are absolute.
The built-in assembler allows you to carry out any operation on an absolute value, but it restricts operations on relocatable values to addition and subtraction of constants.
Every built-in assembler expression has a type, or more correctly a size, because the assembler regards the type of an expression simply as the size of its memory location. For example, the type of an Integer variable is four, because it occupies 4 bytes. The built-in assembler performs type checking whenever possible, so in the instructions
var QuitFlag: Boolean; OutBufPtr: Word; . . . asm MOV AL,QuitFlag MOV BX,OutBufPtr end;
the assembler checks that the size of QuitFlag is one (a byte), and that the size of OutBufPtr is two (a word). The instruction
MOV DL,OutBufPtr
produces an error because DL is a byte-sized register and OutBufPtr is a word. The type of a memory reference can be changed through a typecast; these are correct ways of writing the previous instruction:
MOV DL,BYTE PTR OutBufPtr MOV DL,Byte(OutBufPtr) MOV DL,OutBufPtr.Byte
These MOV instructions all refer to the first (least significant) byte of the OutBufPtr variable.
In some cases, a memory reference is untyped. One example is an immediate value (Buffer) enclosed in square brackets:
procedure Example(var Buffer); asm MOV AL, [Buffer] MOV CX, [Buffer] MOV EDX, [Buffer] end;
The built-in assembler permits these instructions, because the expression [Buffer] has no type. [Buffer] means "the contents of the location indicated by Buffer," and the type can be determined from the first operand (byte for AL, word for CX, and double-word for EDX).
In cases where the type can't be determined from another operand, the built-in assembler requires an explicit typecast. For example,
INC BYTE PTR [ECX] IMUL WORD PTR [EDX]
The following table summarizes the predefined type symbols that the built-in assembler provides in addition to any currently declared Delphi types.
Predefined type symbols
Symbol |
Type |
BYTE |
1 |
WORD |
2 |
DWORD |
4 |
QWORD |
8 |
TBYTE |
10 |
The built-in assembler provides a variety of operators. Precedence rules are different from that of the Delphi language; for example, in an asm statement, AND has lower precedence than the addition and subtraction operators. The following table lists the built-in assembler's expression operators in decreasing order of precedence.
Precedence of built-in assembler expression operators
Operators |
Remarks |
Precedence |
& |
|
highest |
(... ), [... ],., HIGH, LOW |
|
|
+, - |
unary + and - |
|
: |
|
|
OFFSET, TYPE, PTR, *, /, MOD, SHL, SHR, +, - |
binary + and - |
|
NOT, AND, OR, XOR |
|
lowest |
The following table defines the built-in assembler's expression operators.
Definitions of built-in assembler expression operators
Operator |
Description |
& |
Identifier override. The identifier immediately following the ampersand is treated as a user-defined symbol, even if the spelling is the same as a built-in assembler reserved symbol. |
(... ) |
Subexpression. Expressions within parentheses are evaluated completely prior to being treated as a single expression element. Another expression can precede the expression within the parentheses; the result in this case is the sum of the values of the two expressions, with the type of the first expression. |
[... ] |
Memory reference. The expression within brackets is evaluated completely prior to being treated as a single expression element. Another expression can precede the expression within the brackets; the result in this case is the sum of the values of the two expressions, with the type of the first expression. The result is always a memory reference. |
. |
Structure member selector. The result is the sum of the expression before the period and the expression after the period, with the type of the expression after the period. Symbols belonging to the scope identified by the expression before the period can be accessed in the expression after the period. |
HIGH |
Returns the high-order 8 bits of the word-sized expression following the operator. The expression must be an absolute immediate value. |
LOW |
Returns the low-order 8 bits of the word-sized expression following the operator. The expression must be an absolute immediate value. |
+ |
Unary plus. Returns the expression following the plus with no changes. The expression must be an absolute immediate value. |
- |
Unary minus. Returns the negated value of the expression following the minus. The expression must be an absolute immediate value. |
+ |
Addition. The expressions can be immediate values or memory references, but only one of the expressions can be a relocatable value. If one of the expressions is a relocatable value, the result is also a relocatable value. If either of the expressions is a memory reference, the result is also a memory reference. |
- |
Subtraction. The first expression can have any class, but the second expression must be an absolute immediate value. The result has the same class as the first expression. |
: |
Segment override. Instructs the assembler that the expression after the colon belongs to the segment given by the segment register name (CS, DS, SS, FS, GS, or ES) before the colon. The result is a memory reference with the value of the expression after the colon. When a segment override is used in an instruction operand, the instruction is prefixed with an appropriate segment-override prefix instruction to ensure that the indicated segment is selected. |
OFFSET |
Returns the offset part (double word) of the expression following the operator. The result is an immediate value. |
TYPE |
Returns the type (size in bytes) of the expression following the operator. The type of an immediate value is 0. |
PTR |
Typecast operator. The result is a memory reference with the value of the expression following the operator and the type of the expression in front of the operator. |
* |
Multiplication. Both expressions must be absolute immediate values, and the result is an absolute immediate value. |
/ |
Integer division. Both expressions must be absolute immediate values, and the result is an absolute immediate value. |
MOD |
Remainder after integer division. Both expressions must be absolute immediate values, and the result is an absolute immediate value. |
SHL |
Logical shift left. Both expressions must be absolute immediate values, and the result is an absolute immediate value. |
SHR |
Logical shift right. Both expressions must be absolute immediate values, and the result is an absolute immediate value. |
NOT |
Bitwise negation. The expression must be an absolute immediate value, and the result is an absolute immediate value. |
AND |
Bitwise AND. Both expressions must be absolute immediate values, and the result is an absolute immediate value. |
OR |
Bitwise OR. Both expressions must be absolute immediate values, and the result is an absolute immediate value. |
XOR |
Bitwise exclusive OR. Both expressions must be absolute immediate values, and the result is an absolute immediate value. |
Copyright(C) 2008 CodeGear(TM). All Rights Reserved.
|
What do you think about this topic? Send feedback!
|