RAD Studio (Common)
ContentsIndex
PreviousUpNext
Fundamental Syntactic Elements

This topic introduces the Delphi language character set, and describes the syntax for declaring:

  • Identifiers
  • Numbers
  • Character strings
  • Labels
  • Source code comments

The Delphi language uses the Unicode character encoding for its character set, including alphabetic and alphanumeric Unicode characters and the underscore. It is not case-sensitive. The space character and control characters (U+0000 through U+001F including U+000D, the return or end-of-line character) are blanks. 

The RAD Studio compiler will accept a file encoded in UCS-2 or UCS-4 if the file contains a byte order mark. The speed of compilation may be penalized by the use for formats other than UTF–8, however. All characters in a UCS-4 encoded source file must be representable in UCS-2 without surrogate pairs. UCS-2 encodings with surrogate pairs (including GB18030) are accepted only if the codepage compiler option is specified. 

Fundamental syntactic elements, called tokens, combine to form expressions, declarations, and statements. A statement describes an algorithmic action that can be executed within a program. An expression is a syntactic unit that occurs within a statement and denotes a value. A declaration defines an identifier (such as the name of a function or variable) that can be used in expressions and statements, and, where appropriate, allocates memory for the identifier.

On the simplest level, a program is a sequence of tokens delimited by separators. A token is the smallest meaningful unit of text in a program. A separator is either a blank or a comment. Strictly speaking, it is not always necessary to place a separator between two tokens; for example, the code fragment

Size:=20;Price:=10;

is perfectly legal. Convention and readability, however, dictate that we write this as

 Size := 20; Price := 10;

Tokens are categorized as special symbols, identifiers, reserved words, directives, numerals, labels, and character strings. A separator can be part of a token only if the token is a character string. Adjacent identifiers, reserved words, numerals, and labels must have one or more separators between them.

Special Symbols

Special symbols are non-alphanumeric characters, or pairs of such characters, that have fixed meanings. The following single characters are special symbols: 

# $ & ' ( ) * + , - . / : ; < = > @ [ ] ^ { }  

The following character pairs are also special symbols: 

(* (. *) .) .. // := <= >= <>  

The following table shows equivalent symbols:

Special symbol 
Equivalent symbols 
[  
(.  
]  
.)  
{  
(*  
}  
*)  

The left bracket [ is equivalent to the character pair of left parenthesis and period (. 

The right bracket ] is equivalent to the character pair of period and right parenthesis .) 

The left brace { is equivalent to the character pair of left parenthesis and asterisk (*

The right brace } is equivalent to the character pair of right parenthesis and asterisk *)

Note: %, ?, \, !, " (double quotation marks), _ (underscore), | (pipe), and ~ (tilde) are not special characters.

Identifiers

Identifiers denote constants, variables, fields, types, properties, procedures, functions, programs, units, libraries, and packages. An identifier can be of any length, but only the first 255 characters are significant. An identifier must begin with an alphabetic character or an underscore (_) and cannot contain spaces; alphanumeric characters, digits, and underscores are allowed after the first character. Reserved words cannot be used as identifiers.

Note: The .NET SDK recommends against using leading underscores in identifiers, as this pattern is reserved for system use.
Since the Delphi Language is case-insensitive, an identifier like CalculateValue could be written in any of these ways:

 CalculateValue calculateValue calculatevalue
                                        CALCULATEVALUE

Since unit names correspond to file names, inconsistencies in case can sometimes affect compilation. For more information, see the topic, Unit References and the Uses Clause.

Qualified Identifiers

When you use an identifier that has been declared in more than one place, it is sometimes necessary to qualify the identifier. The syntax for a qualified identifier is 

identifier1.identifier2  

where identifier1 qualifies identifier2. For example, if two units each declare a variable called CurrentValue, you can specify that you want to access the CurrentValue in Unit2 by writing

Unit2.CurrentValue

Qualifiers can be iterated. For example,

Form1.Button1.Click

calls the Click method in Button1 of Form1

If you don't qualify an identifier, its interpretation is determined by the rules of scope described in Blocks and scope.

Extended Identifiers

You might encounter identifiers (e.g. types, or methods in a class) having the same name as a Delphi language keyword. For example, a class might have a method called begin. Another example is the CLR class called Type, in the System namespace. Type is a Delphi language keyword, and cannot be used for an identifier name.  

If you qualify the identifier with its full namespace specification, then there is no problem. For example, to use the Type class, you must use its fully qualified name:

var TMyType : System.Type; // Using fully
                                                  qualified namespace // avoides ambiguity with
                                                  Delphi language keyword.

As a shorter alternative, the ampersand (&) operator can be used to resolve ambiguities between identifiers and Delphi language keywords. If you encounter a method or type that is the same name as a Delphi keyword, you can omit the namespace specification if you prefix the identifier name with an ampersand. For example, the following code uses the ampersand to disambiguate the CLR Type class from the Delphi keyword type

var TMyType : &Type; // Prefix with
                                                  '&' is ok.

 

Reserved Words

The following reserved words cannot be redefined or used as identifiers.  

Reserved Words  

add  
else  
initialization  
program  
then  
and  
end  
inline  
property  
threadvar  
array  
except  
interface  
raise  
to  
as  
exports  
is  
record  
try  
asm  
file  
label  
remove  
type  
begin  
final  
library  
repeat  
unit  
case  
finalization  
mod  
resourcestring  
unsafe  
class  
finally  
nil  
seled  
until  
const  
for  
not  
set  
uses  
constructor  
function  
not  
shl  
var  
destructor  
goto  
of  
shr  
while  
dispinterface  
if  
or  
static  
with  
div  
implementation  
out  
strict private  
xor  
do  
in  
packed  
strict protected  
 
downto  
inherited  
procedure  
string  
 

In addition to the words above, private, protected, public, published, and automated act as reserved words within class type declarations, but are otherwise treated as directives. The words at and on also have special meanings, and should be treated as reserved words.

Directives

Directives are words that are sensitive in specific locations within source code. Directives have special meanings in the Delphi language, but, unlike reserved words, appear only in contexts where user-defined identifiers cannot occur. Hence -- although it is inadvisable to do so -- you can define an identifier that looks exactly like a directive.  

Directives  

absolute  
export  
name  
protected  
scopedenums  
abstract  
external  
near  
public  
stdcall  
assembler  
far  
nodefault  
published  
stored  
automated  
forward  
overload  
read  
varargs  
cdecl  
implements  
override  
readonly  
virtual  
contains  
index  
package  
register  
write  
default  
inline  
pascal  
reintroduce  
writeonly  
deprecated  
library  
platform  
requires  
 
dispid  
local  
pointermath  
resident  
 
dynamic  
message  
private  
safecall  
 

 

Numerals

Integer and real constants can be represented in decimal notation as sequences of digits without commas or spaces, and prefixed with the + or - operator to indicate sign. Values default to positive (so that, for example, 67258 is equivalent to +67258) and must be within the range of the largest predefined real or integer type. 

Numerals with decimal points or exponents denote reals, while other numerals denote integers. When the character E or e occurs within a real, it means "times ten to the power of". For example, 7E2 means 7 * 10^2, and 12.25e+6 and 12.25e6 both mean 12.25 * 10^6. 

The dollar-sign prefix indicates a hexadecimal numeral, for example, $8F. Hexadecimal numbers without a preceding - unary operator are taken to be positive values. During an assignment, if a hexadecimal value lies outside the range of the receiving type an error is raised, except in the case of the Integer (32-bit integer) where a warning is raised. In this case, values exceeding the positive range for Integer are taken to be negative numbers in a manner consistent with 2's complement integer representation. 

For more information about real and integer types, see Data Types. For information about the data types of numerals, see True constants.

Labels

A label is a standard Delphi language identifier with the exception that, unlike other identifiers, labels can start with a digit. Numeric labels can include no more than ten digits - that is, a numeral between 0 and 9999999999. 

Labels are used in goto statements. For more information about goto statements and labels, see Goto statements.

Character Strings

A character string, also called a string literal or string constant, consists of a quoted string, a control string, or a combination of quoted and control strings. Separators can occur only within quoted strings. 

A quoted string is a sequence of characters, from an ANSI or multibyte character set, written on one line and enclosed by apostrophes. A quoted string with nothing between the apostrophes is a null string. Two sequential apostrophes in a quoted string denote a single character, namely an apostrophe.  

The string is represented internally as a Unicode string encoded as UTF-16. Characters in the Basic Multilingual Plane (BMP) take 2 bytes, and characters not in the BMP require 4 bytes. 

For example,

 'CodeGear' { CodeGear } 'You''ll see' { You'll see }
                                        'アプリケーションを Unicode 対応にする' '''' { ' } '' { null string } ' '
                                        { a space }

A control string may also be written as a sequence of one or more integers, each of which consists of the # symbol followed by an unsigned integer constant from 0 to 65,535 (decimal) or from $0 to $FFFF (hexadecimal) in UTF-16 encoding, and denotes the character corresponding to a specified code value. Each integer is represented internally by 2 bytes in the string. This is useful for representing control characters and multibyte characters. The control string

#89#111#117

is equivalent to the quoted string

'You'

You can combine quoted strings with control strings to form larger character strings. For example, you could use

'Line 1'#13#10'Line 2'

to put a carriage-return line-feed between 'Line 1' and 'Line 2'. However, you cannot concatenate two quoted strings in this way, since a pair of sequential apostrophes is interpreted as a single character. (To concatenate quoted strings, use the + operator or simply combine them into a single quoted string.) 

A character string is compatible with any string type and with the PChar type. Since an AnsiString type may contain multibyte characters, a character string with one character, single or multibyte, is compatible with any character type. When extended syntax is enabled (with compiler directive {$X+}), a nonempty character string of length n is compatible with zero-based arrays and packed arrays of n characters. For more information, see Datatypes.

Comments are ignored by the compiler, except when they function as separators (delimiting adjacent tokens) or compiler directives.  

There are several ways to construct comments:

{ Text between a left brace and a right brace constitutes a comment. }
(* Text between a left-parenthesis-plus-asterisk and an asterisk-plus-right-parenthesis is also a comment *)
// Any text between a double-slash and the end of the line constitutes a comment.

Comments that are alike cannot be nested. For instance, {{}} will not work, but (*{}*)will. This latter form is useful for commenting out sections of code that also contain comments. 

Here are some recommendations about how and when to use the three types of comment characters:

  • Use the double-slash (//) for commenting out temporary changes made during development. You can use the Code Editor's convenient CTRL+/ (slash) mechanism to quickly insert the double-slash comment character while you are working.
  • Use the parenthesis-star "(*...*)" both for development comments and for commenting out a block of code that contains other comments. This comment character permits multiple lines of source, including other types of comments, to be removed from consideration by the compiler.
  • Use the braces ({}) for in-source documentation that you intend to remain with the code.
A comment that contains a dollar sign ($) immediately after the opening { or (* is a compiler directive. For example,

{$WARNINGS OFF}

tells the compiler not to generate warning messages.

Copyright(C) 2009 Embarcadero Technologies, Inc. All Rights Reserved.
What do you think about this topic? Send feedback!