RAD Studio
Using TEncoding for Unicode Files

Many Delphi applications will need to continue to interact with other applications or datasources, many of which can only handle data in ANSI or ASCII. For this reason, the defaults for the TStrings methods will write the files ANSI encoded based on the active code page and will read the files based on whether or not the file contains a Byte Order Mark (BOM).  

If a BOM is found, it will read the data encoded as the BOM indicates. If no BOM is found, it will read it as ANSI and up-convert based on the current active codepage.  

All your files written with pre-RAD Studio 2009 versions of Delphi will still be read in, with the caveat that as long as you read with the active codepage the same as with what was written. Likewise, any file written with RAD Studio 2009 with an ASCII encoding should be readable with the pre-RAD Studio 2009 version.  

Any file written with RAD Studio 2009 with any other encoding will generate a BOM and will not be readable with a the pre-RAD Studio 2009 version. At this point, only the most common BOM formats are detected (UTF16 Little-Endian, UTF16 Big-Endian and UTF8).

You may want to read/write text data using the TStrings class a loss-less Unicode format, be that Little-Endian UTF16, Big-Endian UTF16, UTF8, UTF7 so on. The TEncoding class is very similar in methods and functionality that you can find in the System.Text.Encoding class in the .NET Framework.

  S: TStrings;
  S: TStringList.Create();
  { ... }
  S.SaveToFile('config.txt', TEncoding UTF8);

Without the extra parameter, ‘config.txt’ would simply be converted and written out as ANSI encoded based on the current active codepage. You do not need to change the read code since TStrings will automatically detect the encoding based on the BOM and do the right thing.  

If you wanted to force the file to read and write using a specific codepage, you can create an instance of TMBCSEncoding and pass in the code page you want to use into the constructor. Then you use that instance to read and write the file, since the specific codepage may not match the user’s active codepage. 

The same thing holds for these classes in that the data will be read and written as ANSI data. Since INI files have always been traditionally ANSI (ASCII) encoded, it may not make sense to convert these. It will depend on the needs of your application. If you do wish to change to use a Unicode format, we will offer ways to use the TEncoding classes to accomplish that as well.  

In all the above cases, the internal storage will be Unicode and any data manipulation you do with string will continue to function as expected. Conversions will automatically happen when reading and writing the data.  

Here is a list of codepage identifiers (MSDN).

The following list shows the overload methods that accept a TEncoding parameter:

Copyright(C) 2009 Embarcadero Technologies, Inc. All Rights Reserved.
What do you think about this topic? Send feedback!