Configuring and operating GT.M with Unicode™ support (optional)

The configure script provides the option to install GT.M with or without Unicode™ support for encoding international character sets. This section describes the system environment required to install and operate GT.M with Unicode™ support. Users who handle data in ASCII or other single-byte character sets such as one of the ISO-8859 representations and do not foresee any use of character sets beyond single byte character sets, may proceed to the next section.

M mode and UTF-8 mode

A GT.M process can operate in either M mode or UTF-8 mode. In certain circumstances, both M mode and UTF-8 mode may concurrently access the same database.

$gtm_chset determines the mode in which a process operates. If it has a value of M, GT.M treats all 256 combinations of the 8 bits in a byte as a character, which is suitable for many single-language applications.

If $gtm_chset has a value of UTF-8, GT.M (at process startup) interprets strings as being encoded in UTF-8. In this mode, all functionality related to Unicode™ becomes available and standard string-oriented operations operate with UTF-8 encoding. In this mode, GT.M detects character boundaries (since the size of a character is variable length), calculates glyph display width, and performs string conversion between UTF-8 and UTF-16.

If you install GT.M with Unicode support, all GT.M components related to M mode reside in your GT.M distribution directory and Unicode-related components reside in the utf8 subdirectory of your GT.M distribution. For processes in UTF-8 mode, in addition to gtm_chset, ensure that $gtm_dist points to the utf8 subdirectory, that $gtmroutines includes the utf8 subdirectory (or the libgtmutil.so therein) rather than its parent directory.

Compiling ICU

GT.M uses ICU 3.6 (or above) to perform Unicode™-related operations. GT.M generates the distribution for Unicode only if ICU 3.6 (or above) is installed on the system. Therefore, install an appropriate ICU version before installing GT.M to perform functionality related to Unicode.

Note that the ICU installation instructions may not be the same for every platform. If libicu has been compiled with symbol renaming enabled, GT.M requires $gtm_icu_version be explicitly set. By default, GT.M uses the most current installed version of ICU. GT.M expects ICU to have been built with symbol renaming disabled and issues an error at startup if the currently installed version of ICU has been built with symbol renaming enabled. To use a different version of ICU (not the currently installed) or a version of ICU built with symbol renaming enabled, use $gtm_icu_version to specify the MAJOR VERSION and MINOR VERSION numbers of the desired ICU formatted as MajorVersion.MinorVersion (for example "3.6" to denote ICU-3.6). When $gtm_icu_version is so defined, GT.M attempts to open the specific version of ICU. In this case, GT.M works regardless of whether or not symbols in this ICU have been renamed. A missing or ill-formed value for this environment variable causes GT.M to only look for non-renamed ICU symbols. Note that display widths for a few characters are different starting in ICU 4.0.

[Note] Note

If you are using gtmprofile, you do not have to set $gtm_icu_version.

After installing ICU 3.6 (or above), you also need to set the following environment variables to an appropriate value.

  1. LC_CTYPE

  2. LC_ALL

  3. LD_LIBRARY_PATH

  4. TERM