A transaction processing application makes a series of database updates. GT.M executes these updates online or from data-driven logic, commonly called "batch."

The processing model in each case is a transaction or a unit of work initiated by client input such as a request to transfer funds from one account to another, or as the next logical unit of work such as posting interest on the next account. This general model holds both for applications where users login directly to a host (perhaps using terminal emulation from a workstation) and those where a client communicates with a host server process. This section lists key considerations for a transaction processing application to:

FIS recommends you to plan upfront for database consistency while designing the architecture of an LMS application. Some of the planning parameters for application's architecture may include:

Represents logical dual-site application architecture.

This diagram illustrates an application architecture that can reliably perform batch and online updates in a tiered environment. It addresses the online updates via the Message Transport (which has to be able to reroute communications to the current originating instance after a cutover) and batch updates via an external file (which has to be made available to the current originating instance after a cutover).

An application server is a GT.M process that accepts, processes, and responds to messages provided through the Message Transport. They may exist as a bunch of application servers in a “cloud” of size determined by the size of the node and the needs of the application. On the originating instance, an application server process receives messages and processes application transactions. The application logic issues the TSTART command and a series of SET (also KILL and MERGE) commands that [potentially/provisionally] update the database, then a TCOMMIT command to finalize the transaction. The process may directly WRITE a reply, but another process may act as an agent that takes that reply from a database record and sends it to the originator.

This section describes how a well-designed messaging system makes an application's architecture more cutover-ready by using an example in which the originating instance fails after the TCOMMIT, but before the system generates a reply and transmits it to the client.

As noted in the previous section, application servers on the originating instance respond to messages from clients delivered over a network for online operations in a tiered environment. Each client message results in zero (inquiry) or one update transaction on the server. The network delivering messages must be robust. This means each message must either be delivered exactly once to an application server on the originating instance, or result in client notification of the delivery failure. The messaging system must handle situations such as failure on the originating instance after the client transmits the message but before the originating instance receives it. Integration of the message delivery system with the logic determining whether an instance is an originating instance or replicating instance at any time reduces risk and switch over time.

Application logic typically responds to client messages with a reply generated immediately after the TCOMMIT for a transaction. The application and the message architecture must handle the scenario in which the originating system fails after the TCOMMIT, but before the system generates a reply and transmits it to the client. In such a scenario, the client waits for a response and eventually timesout and retries the message.

An LMS application can handle this situation by designing the message structure to have a unique message identifier (MSGID), and the application to include the MSGID in the database as part of the TCOMMIT.

If the originating instance crashes after committing the transaction and the cutover logic makes the former replicating instance the new originating instance--This new originating instance, then, receives the retried message that has the same MSGID from the client. In this case, one of the following can occur:

Cutover is the process of reconfiguring an LMS application so that a replicating instance takes over as the current originating instance. This might be a planned activity, such as bringing down the originating instance for hardware maintenance, or it may be unplanned such as maintaining application availability when the originating instance or the network to the originating instance goes down.

Implementing and managing cutover is outside the scope of GT.M. FIS recommends you to adhere to the following rules while designing cutover:

[Important]Important

A cutover is a wholesome practice for maximizing business continuity. FIS strongly recommends setting up a cutover mechanism to keep a GT.M application up in the face of disruptions that arise due to errors in the underlying platform. In environments where a cutover is not a feasible due to operational constraints, consider setting up an Instance Freeze mechanism for your application. For more information, refer to “Instance Freeze”.

In the event of run-time conditions such as no disk space, I/O problems, or disk structure damage, some operational policies favor deferring maintenance to a convenient time as long as it does not jeopardize the functioning of the GT.M application. For example, if the journal file system runs out of disk space, GT.M continues operations with journaling turned off and moves to the replication WAS_ON state until journaling is restored. If there is a problem with one database file or journal file, processes that update other database regions continue normal operation.

Some operational policies prefer stopping the GT.M application in such events to promptly perform maintenance. For such environments, GT.M has a mechanism called "Instance Freeze".

The Instance Freeze mechanism provides an option to stop all updates on the region(s) of an instance as soon as a process encounters an error while writing to a journal or database file. This mechanism safeguards application data from a possible system crash after such an error.

The environment variable gtm_custom_errors specifies the complete path to the file that contains a list of errors that should automatically stop all updates on the region(s) of an instance. The error list comprises of error mnemonics (one per line and in capital letters) from the GT.M Message and Recovery Guide. The GT.M distribution kits include a custom_errors_sample.txt file which can be used as a target for the gtm_ custom_errors environment variable.

[Note]Note

When a processes that is part of an instance configured for instance freeze behavior encounters an error with journaling, it freezes the instance and invokes its own error trap even if it does not have the gtm_custom_errors environment variable set.

You can enable the Instance Freeze mechanism selectively on any region(s) of an instance. For example, a region that represents a patient or financial record may qualify for an Instance Freeze whereas a region with an easily rebuilt cross reference index may not. You can also promptly freeze an instance irrespective of whether any region is enabled for Instance Freeze.

MUPIP SET -[NO]INST[_FREEZE_ON_ERROR] [-REGION|-FILE] enables custom errors in region to automatically cause an Instance Freeze. MUPIP REPLICATE -SOURCE -FREEZE={ON|OFF} -[NO]COMMENT[='"string"'] promptly sets or clears an Instance Freeze on an instance irrespective of whether any region is enabled for Instance Freeze (with MUPIP SET -INST_FREEZE_ON_ERROR).

A process that is not in a replicated environment ignores $gtm_custom_errors. The errors in the custom errors file must have a context in one of the replicated regions and the process recognizing the error must have the replication Journal Pool open. For example, an error like UNDEF cannot cause an Instance Freeze because it is not related to the instance. It also means that, for example, standalone MUPIP operations can neither cause nor honor an Instance Freeze because they do not have access to the replication Journal Pool. A process with access to the replication Journal Pool must honor an Instance Freeze even if does not have a custom error file and therefore cannot initiate an Instance Freeze.

Depending on the error, removing an Instance Freeze is operator driven or automatic. GT.M automatically removes Instance Freezes that are placed because of no disk space; for all other errors, Instance Freeze must be cleared manually by operator intervention. For example, GT.M automatically places an Instance Freeze when it detects a DSKNOSPCAVAIL message in the operator log. It automatically clears the Instance Freeze when an operator intervention clears the no disk space condition. During an Instance Freeze, GT.M modifies the NOSPACEEXT message from error (-E-) to warning (-W-) to indicate it is performing the extension even though the available space is less than the specified extension amount. The following errors are listed in the custom_errors_sample.txt file. Note that GT.M automatically clears the Instance Freeze set with DSKNOSPCAVAIL when disk space becomes available. All other errors require operator intervention.

During an Instance Freeze, attempts to update the database and journal files hang but operations like journal file extract which do not require updating the database file(s) continue normally. When an Instance Freeze is cleared, processes automatically continue with no auxiliary operational or programmatic intervention. The Instance Freeze mechanism records both the freeze and the unfreeze in the operator log.

[Note]Note

As of V6.0-000, the Instance Freeze facility is a field test grade implementation. Because there are a large number of errors that GT.M can recognize and because GT.M has several operational states, the GT.M team has tested the errors in the custom_errors_sample.txt which are consistent with what we expect to be common usage. If you experience problems trying to add other errors or have concerns about plans to add other errors, please consult your GT.M support channel.

GT.M includes a plugin reference implementation that provides the functionality to secure the replication connection between instances using Transport Layer Security (TLS; previously known as SSL). Just as database encryption helps protect against unauthorized access to a database by an unauthorized process that is able to access disk files (data at rest), the plugin reference implementation secures the replication connection between instances and helps prevent unauthorized access to data in transit. FIS has tested GT.M's replication operations of the TLS plugin reference implementation using OpenSSL (http://www.openssl.org). A future GT.M release may include support for popular and widely available TLS implementations / cryptography packages other than OpenSSL. Note that a plug-in architecture allows you to choose a TLS implementation and a cryptography package. FIS neither recommends nor supports any specific TLS implementation or cryptography package and you should ensure that you have confidence in and support for whichever package that you intend to use in production.

[Note]Note

Database encryption and TLS/SSL replication are just two of many components of a comprehensive security plan. The use of database encryption and TLS replication should follow from a good security plan. This section discusses encrypted GT.M replicating instances and securing the replication connection between them using TLS/SSL; it does not discuss security plans.

[Important]Important

You can setup TLS replication between instances without using GT.M Database Encryption. GT.M Database Encryption is not a prerequisite to using TLS replication.

The general procedure of creating a TLS replication setup includes the following tasks:

  1. Create a new database or use an existing one.

  2. “Creating a root-level certification authority”

  3. “Creating leaf-level certificates”

  4. “Creating a configuration file”

  5. Enabling replication and starting the Source and Receiver Servers with the TLSID qualifier.

To use TLS, the communicating parties need to authenticate each other. If the authentication succeeds, the parties encrypt the subsequent communication. TLS authentication uses certificates signed by Certificate Authorities (CAs). Certificate Authorities' certificates themselves are signed (and trusted) by other CAs eventually leading to a Root CA, which self-signs. its own certificate Although the topic of certificates, and the use of software such as OpenSSL is well beyond the scope of GT.M documentation, the steps below illustrate the quick-start creation of a test environment using Source and Receiver certifications with a self-signed Root CA certificate.

Creating a root certificate authority involves three steps.

At this point, ca.crt is a root certificate that can be used to sign other certificates (including intermediate certificate authorities). The private key of the root certificate must be protected from unauthorized access.

The root certificate is used to sign regular, leaf-level certificates. Below are steps showing the creation of a certificate to be used to authenticate a GT.M Source Server with a GT.M Receiver Server (and vice-versa).

  1. Generate a private key. This is identical to step (a) of root certificate generation.

  2. Generate a certificate sign request with the OpenSSL command openssl req -new -key client.key -out client.csr. The command first prompts for the password of the private key followed by a series of interactive queries regarding the attributes of the certificate. Below is sample output:

    Enter pass phrase for client.key:
    You are about to be asked to enter information that will be incorporated into your certificate request.
    What you are about to enter is what is called a Distinguished Name or a DN.
    There are quite a few fields but you can leave some blank
    For some fields there will be a default value,
    If you enter '.', the field will be left blank.

    -----

    Country Name (2 letter code) [AU]:US
    State or Province Name (full name) [Some-State]:PA
    Locality Name (eg, city) []:Malvern
    Organization Name (eg, company) [Internet Widgits Pty Ltd]:XYZQ International
    Organizational Unit Name (eg, section) []: OurSourceServer
    Common Name (e.g. server FQDN or YOUR name) []:www.xyzq.com
    Email Address []:xyzq@xyzq.com
    Please enter the following 'extra' attributes to be sent with your certificate request
    A challenge password []:challenge
    An optional company name []:XYZQ Pvt. Ltd
    

    Typically, organization that generates the certificate sign then sends it to a certificate authority (or a root certificate authority), which audits the request and signs the certificate with its private key, thereby establishing that the certificate authority trusts the company/organization that generated the certificate and requested its signing. In this example, we sign the certificate sign request with the root certificate generated above.

  3. Sign the certificate sign request with an OpenSSL command like:

    openssl ca -config $PWD/openssl.cnf -in client.ccr -out client.crt

    The output of this command looks like the following:

    >You are about to be asked to enter information that will be incorporated
    into your certificate request.
    What you are about to enter is what is called a Distinguished Name or a DN.
    There are quite a few fields but you can leave some blank
    For some fields there will be a default value,
    If you enter '.', the field will be left blank.
    Country Name (2 letter code) [US]: US
    State or Province Name (full name) [Philadelphia]:Illinois
    City (e.g., Malvern) [Malvern]:Chicago"
    Organization Name (eg, company) [FIS]:FIS
    Organizational Unit Name (eg, section) [GT.M]:GT.M
    Common Name (e.g. server FQDN or YOUR name) [localhost]:fisglobal.com
    Ename Address (e.g. helen@gt.m) []:root@gt.m
    
    Please enter the following 'extra' attributes
    to be sent with your certificate request
    A challenge password []:
    An optional company name []:
    Using configuration from /usr/lib/ssl/openssl.cnf
    Enter pass phrase for ./certs/ca.key:
    Check that the request matches the signature
    Signature ok
    Certificate Details:
     Serial Number: 14 (0xe)
     Validity
     Not Before: Jun 11 14:06:53 2014 GMT
     Not After : Jun 12 14:06:53 2014 GMT
     Subject:
     countryName = US
     stateOrProvinceName = Illinois
     organizationName = FIS
     organizationalUnitName = GT.M
     commonName = fisglobal.com
     emailAddress = helen@gt.m
     X509v3 extensions:
     X509v3 Basic Constraints:
     CA:FALSE
     Netscape Comment:
     OpenSSL Generated Certificate
     X509v3 Subject Key Identifier:
     96:FD:43:0D:0A:C1:AA:6A:BB:F3:F4:02:D6:1F:0A:49:48:F4:68:52
     X509v3 Authority Key Identifier:
     keyid:DA:78:3F:28:8F:BC:51:78:0C:5F:27:30:6C:C5:FE:B3:65:65:85:C9
    
    Certificate is to be certified until Jun 12 14:06:53 2014 GMT (1 days)
    Sign the certificate? [y/n]:y
    
    
    1 out of 1 certificate requests certified, commit? [y/n]y
    Write out database with 1 new entries
    Data Base Updated
[Important]Important

Keep the self-signed root certificate authority and leaf-level certificates in a secure location. Protect their directories with 0500 permissions and the individual files with 0400 permissions so that unauthorized users cannot access them.

Please refer to OpenSSL documentation http://www.openssl.org/docs/ for information on how to create intermediate CAs, Diffie-Hellman parameters, Certificate Revocation Lists, and so on.

The configuration file is divided into two sections–Database encryption section and the TLS section. The database encryption section contains the list of database files and their corresponding key files and the TLS section contains a TLSID label that identifies the location of root certification authority certificate in PEM format and leaf-level certificate with their corresponding private key files. Note that the use of the gtmcrypt_config environment variable require the libconfig library to be installed.

After creating a leaf-level certificate that is signed by a self-signed root certificate, create a configuration file (one for Source and the other for Receiver Server) with the following format:

tls: {
     verify-depth: 7;
     CAfile: "/path/to/ca.crt";
     tls : {
           format: "PEM";
           cert: "/path/to/client.crt";
           key: "/path/to/client.key";
     };
};

where tls specifies the TLSID that is used to start the Source/Receiver Server, CAfile specifies the path to the root certification authority, cert specifies the the path to leaf-level certificate and key specifies the path to the private key file.

Set the gtmcrypt_config environment variable to point to the path to the configuration file. The environment variable gtmtls_passwd_<tlsid> must specify an obfuscated version of the password for the client's private key. Use the maskpass utility provided with your GT.M distribution to create an obfuscated password.

Here is a sample configuration file:

/* Database encryption section */
 
database: {
      keys: (
            {
                  dat: "/tmp/mumps.dat";  /* Encrypted database file. */
                  key: "/tmp/mumps.key";  /* Encrypted symmetric key. */
            },
            {
                  dat: "/tmp/a.dat";
                  key: "/tmp/a.key";
            },
            ...
      );
}
 
/* TLS section */
 
tls: {
      /* Certificate Authority (CA) verify depth provides an upper limit on the number of CAs to look up for verifying a given
       * certificate. The depth count is described as ''level 0:peer certificate'', ''level 1: CA certificate'',
       * ''level 2: higher level CA certificate'', and so on. The default verification depth is 9.
       */
      verify-depth: 7;
 
      /* CAfile: points to a file, in PEM format, describing the trusted CAs. The file can contain several CA certificates identified by:
       * -----BEGIN CERTIFICATE-----
       * ... (CA certificate in base64 encoding) ...
       * -----END CERTIFICATE-----
       * sequences.
       */
      CAfile: "/home/jdoe/current/tls/certs/CA/gtmCA.crt";
 
      /* CApath: points to a directory containing CA certificates in PEM format. The files each contain one CA certificate. The files are
       * looked up by the CA subject name hash value, which must hence be available. If more than once certificate with the same
       * name hash value exists, the extension must be different (e.g. 9d66eef0.0, 9d66eef0.1 etc). The directory is typically
       * created by the OpenSSL tool 'c_rehash'.
       */
      CApath: "/home/jdoe/current/tls/certs/CA/";
      
      /* Diffie-Hellman parameters used for key-exchange. Either none or both have to be specified. If neither is specified, then
       * then the data is encrypted with the same keys that are used for authentication.
       */
      dh512: "/home/jdoe/current/tls/dh512.pem";
      dh1024: "/home/jdoe/current/tls/dh1024.pem";
 
      /* crl: points to a file containing list of revoked certificates. This file is created by the openssl utility. */
      crl: "/home/jdoe/current/tls/revocation.crl";
 
      /* Timeout (in seconds) for a given session. If a connection disconnects and resumes within this time interval, the session 
       * is reused to speed up the TLS handshake. A value of 0 forces sessions to not be reused. The default value is 1 hour.
       */
      session-timeout: 600;
 
      /* List of certificate/key pairs specified by identifiers. */
      PRODUCTION: {
            /* Format of the certificate and private key pair. Currently, the GT.M TLS plug-in only supports PEM format. */
            format: "PEM";
            /* Path to the certificate. */
            cert: "/home/jdoe/current/tls/certs/Malvern.crt";
            /* Path to the private key. If the private key is protected by a passphrase, an obfuscated version of the password
             * should be specified in the environment variable which takes the form gtmtls_passwd_<identifier>. For instance,
             * for the below key, the environment variable should be 'gtmtls_passwd_PRODUCTION'.
             * Currently, the GT.M TLS plug-in only supports RSA private keys.
             */
            key: "/home/jdoe/current/tls/certs/Malvern.key";
      };
 
      DEVELOPMENT: {
            format: "PEM";
            cert: "/home/jdoe/current/tls/certs/BrynMawr.crt";
            key: "/home/jdoe/current/tls/certs/BrynMawr.key";
      };
};

If you are using the environment variable gtm_dbkeys to point to the master key file for database encryption, please convert that file to the libconfig configuration file format as pointed to by the $gtmcrypt_config environment variable at your earliest convenience. Effective V6.1-000, the gtm_dbkeys environment variable and the master key file it points to are deprecated in favor of the gtmencrypt_config environment variable. Although V6.1-000 supports the use of $gtm_dbkeys for database encryption, FIS plans to discontinue support for it in the very near future. To convert master key files to libconfig format configuration files, please click on Download CONVDBKEYS.m to download the CONVDBKEYS.m program and follow instructions in the comments near the top of the program file. You can also download CONVDBKEYS.m from http://tinco.pair.com/bhaskar/gtm/doc/articles/downloadables/CONVDBKEYS.m.

Between BEFORE_IMAGE journaling and NOBEFORE_IMAGE journaling, there is no difference in the final state of a database / instance recovered after a crash. The difference between before image and nobefore journaling is in:

When an instance goes down, its recovery consists of (at least) two steps: recovery of the instance itself: hardware, OS, file systems, and so on - say tsys; tsys is almost completely[4] independent of the type of GT.M journaling.

For database recovery:

  • With BEFORE_IMAGE journaling, the time is simply that is needed to execute a mupip journal recover backward "*" command or, when using replication, mupip journal recover -rollback. This uses before image records in the journal files to roll the database files back to their last epochs, and then forward to the most current updates. If this takes tbck, the total recovery time is tsys+tbck.

  • With NOBEFORE_IMAGE journaling, the time is that required to restore the last backup, say, trest plus the time to perform a mupip journal -recover -forward "*" command, say tfwd, for a total recovery time of tsys+trest+tfwd. If the last backup is available online, so that "restoring the backup" is nothing more than setting the value of an environment variable, trest=0 and the recovery time is tsys+tfwd.

Because tbck is less than tfwd, tsys+tbck is less than tsys+tfwd. In very round numbers, tsys may be minutes to tens of minutes, tfwd may be tens of minutes and tbck may be in tens of seconds to minutes. So, recovering the instance A might (to a crude first approximation) be a half order of magnitude faster with BEFORE_IMAGE journaling than with NOBEFORE_IMAGE journaling. Consider two deployment configurations.

  1. Where A is the sole production instance of an application, halving or quartering the recovery time of the instance is significant, because when the instance is down, the enterprise is not in business. The difference between a ten minute recovery time and a thirty minute recovery time is important. Thus, when running a sole production instance or a sole production instance backed up by an underpowered or not easily accessed, "disaster recovery site," before image journaling with backward recovery is the preferred configuration ofis the preferred configuration ofbetter suits a production deployment. Furthermore, in this situation, there is pressure to bring A back up soon, because the enterprise is not in business - pressure that increases the probability of human error.

  2. With two equally functional and accessible instances, A and B, deployed in an LMS configuration at a point in time when A, running as the originating instance replicating to B, crashes, B can be switched from a replicating instance to an originating instance within seconds. An appropriately configured network can change the routing of incoming accesses from one instance to the other in seconds to tens of seconds. The enterprise is down only for the time required to ascertain that A is in fact down, and to make the decision to switch to B— perhaps a minute or two. Furthermore, B is in a "known good" state, therefore, a strategy of "if in doubt, switchover" is entirely appropriate. This time, tswch, is independent of whether A and B are running BEFORE_IMAGE journaling or NOBEFORE_IMAGE journaling. The difference between BEFORE IMAGE journaling and NOBEFORE_IMAGE journaling is the difference in time taken subsequently to recover A, so that it can be brought up as a replicating instance to B. If NOBEFORE_IMAGE journaling is used and the last backup is online, there is no need to first perform a forward recovery on A using its journal files. Once A has rebooted:

    • Extract the unreplicated transactions from the crashed environment

    • Connect the backup as a replicating instance to B and allow it to catch up.

[Note]Note

Applications that can take advantage the forthcoming LMX capability will essentially make tswch zero when used with a suitable front-end network.

Cost

The cost of using an LMS configuration is at least one extra instance plus network bandwidth for replication. There are trade-offs: with two instances, it may be appropriate to use less expensive servers and storage without materially compromising enterprise application availability. In fact, since GT.M allows replication to as many as sixteen instances, it is not unreasonable to use commodity hardware[a] and still save total cost.

Storage

Each extra instance of course requires its own storage for databases and journal files. Nobefore journal files are smaller than the journal files produced by before image journaling, with the savings potentially offset if a decision is made to retain an online copy of the last backup (whether this nets out to a saving or a cost depends on the behavior of the application and on operational requirements for journal file retention).

Performance

IO bandwidth requirements of nobefore journaling are less than those of before image journaling, because GT.M does not write before image journal records or flush the database.

  • With before image journaling, the first time a database block is updated after an epoch, GT.M writes a before image journal record. This means that immediately after an epoch, given a steady rate of updates, there is an increase in before image records (because every update changes at least one database block and generates at least one before image journal record). As the epoch proceeds, the frequency of writing before image records falls back to the steady level[b] – until the next epoch. Before image journal records are larger than journal records that describe updates.

  • At epochs, both before image journaling and nobefore journaling flush journal blocks and perform an fsync() on journal files[c]. When using before image journaling, GT.M ensures all dirty database blocks have been written and does an fsync()[d], but does not take these steps.

Because IO subsystems are often sized to accommodate peak IO rates, choosing NOBEFORE_IMAGE journaling may allow more economical hardware without compromising application throughput or responsiveness.

[a] GT.M absolutely requires the underlying computer system to perform correctly at all times. So, the use of error correcting RAM, and mirrored disks is advised for production instances. But, it may well be cost effective to use servers without redundant power supplies or hot-swappable components, to use RAID rather than SAN for storage, and so on.

[b] How much the steady level is lower depends on the application and workload.

[c] Even flushing as many as 20,000 journal buffers, which is more than most applications use, is only 10MB of data. Furthermore, when GT.M's SYNC_IO journal flag is specified, the fsync() operation requires no physical IO.

[d] The volume of dirty database blocks to be flushed can be large. For example, 80% of 40,000 4KB database blocks being dirty would require 128MB of data to be written and flushed.

A system crash can, and often will, damage a database file, leaving it structurally inconsistent. With before image journaling, normal MUPIP recovery/rollback repairs such damage automatically and restores the database to the logically consistent state as of the end of the last transaction committed to the database by the application. Certain benign errors may also occur (refer to the "Maintaining Database Integrity" chapter). These must be repaired on the (now) replicating instance at an appropriate time, and are not considered "damage" for the purpose of this discussion. Even without before image journaling, a replicating instance (particularly one that is multi-site) may have sufficient durability in the aggregate of its instances so that backups (or copies) from an undamaged instance can always repair a damaged instance.

[Note]Note

If the magnetic media of the database and/or the journal file is damaged (e.g., a head crash on a disk that is not mirrored), automatic repair is problematic. For this reason, it is strongly recommended that organizations use hardware mirroring for magnetic media.

[Caution]Caution

Misuse of UNIX commands, such as kill-9 and ipcrm, by processes running as root can cause database damage.

Considering the high level at which replication operates, the logical dual-site nature of GT.M database replication makes it virtually impossible for related database damage to occur on both originating and replicating instances.

To maintain application consistency, do not use DSE to repair or change the logical content of a replicated region on an originating instance.

[Note]Note

Before attempting manual database repair, FIS strongly recommends backing up the entire database (all regions).

After repairing the database, bring up the replicating instance and backup the database with new journal files. MUPIP backup online allows replicating to continue during the backup. As stated in the Journaling chapter, the journal files prior to the backup are not useful for normal recovery.

GT.M database replication ensures that the originating instance and its replicating instances are logically identical, excluding latency in replication. During rolling upgrades, the originating instance and its replicating instances are logically equivalent but the exact storage of M global variables may differ. FIS recommends you to ensure that all database updates to replicated regions must occur only on the originating instance. GT.M replicates all changes on the originating instance to its replicating instances.

LMS applications require procedures for a number of situations, including both normal operation and failure scenarios. Although LMS applications can operate manually, the requirements are complex enough to warrant automation via scripting. It is essential to carefully design, implement, and test the procedures to provide an automatic cutover to a replicating instance within seconds. GT.M provides the required functionality to implement continuous availability via LMS; however, smooth operation requires significant system engineering to implement.

An initial startup of an application instance has no prior status. There are two ways to bring up an LMS application: bring up the originating instance followed by the replicating instance, or bring both up at the same time.

Single-Site

Dual-site - Concurrent Startup

FIS does not recommend concurrent startup of instances in an LMS configuration because of the possibility of a network or timing problem resulting in multiple originating or replicating instances. However, in a dual-site configuration it is possible to concurrently startup the originating and replicating instance. The steps are as follows:

Single-Site to Multi-Site

FIS recommends you to bring up the application as a single-site first and then bring up any replicating instances.

Setting up an A->B replication configuration for the first time

Perform the following steps to setup an originating instance Philadelphia and its replicating instance Shanghai.

[Note]Download Example

msr_proc1.tar.gz contains a ready-to-run example of setting up an A->B replication configuration. Click Download msr_proc1.tar.gz to download msr_proc1.tar.gz or open directly from http://tinco.pair.com/bhaskar/gtm/doc/books/ao/UNIX_manual/msr_proc1.tar.gz. Use the following command sequence to run this example:

#

Philadelphia - originating instance

Shanghai - replicating instance

1

$ source ./env Philadelphia

$ source ./env Shanghai

2

$ ./repl_setup

$ ./repl_setup

3

$ ./originating_start Philadelphia Shanghai

$./replicating_start Shanghai

4

$ mumps -r %XCMD 'set ^A="Philadelphia"'

$ mumps -r %XCMD 'write ^A'

Philadelphia

5

$./originating_stop

$./replicating_stop

This example demonstrates a command sequence that can be used to set up an A->B replication configuration on a local test system. No claim of copyright is made with respect to the scripts used in this example. YOU MUST UNDERSTAND, AND APPROPRIATELY ADJUST, THIS MODULE BEFORE USING IN A PRODUCTION ENVIRONMENT.

Setting up an A->P replication configuration for the first time

Perform the following steps to setup an originating instance Philadelphia and its supplementary instance Perth.

[Note]Download Example

msr_proc2.tar.gz contains a ready-to-run example of setting up an A->P replication configuration. Click Download msr_proc2.tar.gz to download msr_proc2.tar.gz or open directly from http://tinco.pair.com/bhaskar/gtm/doc/books/ao/UNIX_manual/msr_proc2.tar.gz. Use the following command sequence to run this example:

#

Philadelphia - originating instance

Perth - supplementary instance

1

$ source ./env Philadelphia

$ source ./env Perth

2

$./db_create

$ ./repl_setup

-

3

$ ./originating_start_suppl Philadelphia Perth

-

4

$ ./backup_repl #Note down the target location of orig_ri_backup

$ ./db_create

$ ./suppl_setup /path/to/orig_ri_backup

5

$ mumps -r %XCMD 'set ^A=1'

$ mumps -r %XCMD 'write ^A set ^B=2 write ^B'

6

-

$ ./replicating_stop

7

-

$ ./replicating_start_suppl_n Perth #all subsequent startups of the Receiver Server.

8

$ ./originating_stop

$ ./replicating_stop

This example demonstrates a command sequence that can be used to set up an A->P replication configuration on a local test system. No claim of copyright is made with respect to the scripts used in this example. YOU MUST UNDERSTAND, AND APPROPRIATELY ADJUST, THIS MODULE BEFORE USING IN A PRODUCTION ENVIRONMENT.

Replicating Instance Starts after a Shut Down or Crash

If the replicated regions are untouched after a shutdown or crash, simply restart the replicating instance. It automatically catches up with the originating instance. Perform the following steps to start a replicating instance after a shutdown operation or a crash:

Replicating Instance Starts from a copy of Originating Instance

In this scenario, the database used to start up a replicating instance is logically identical to the database with which the originating instance started up, and has the same journal sequence number. When the replicating instance starts, all transactions applied to the originating instance replicate to the replicating instance as it catches up with the originating instance.

Perform the following steps to start a replicating instance from a copy of the originating instance:

[Note]Note

Cease all GT.M activity in the replicating instance and RUNDOWN any open database files before copying database files onto a replicating instance, or creating new database files and loading them with extracts from the originating instance.

Replicating Instance Starts from Backup of Originating Instance

The more common scenario for bringing up a replicating instance is to take a backup of the originating instance and bring it up as a replicating instance. If the backup is a comprehensive backup, the file headers store the journal sequence numbers.

The backup should use the -newjnlfiles switch of MUPIP backup to create new journal files. Once the replicating instance becomes operational, the Source Server does not need to go back prior to the backup to find transactions to send to the replicating instance.

Perform the following steps to start a replicating instance from the backup of an originating instance:

Follow these steps to perform a controlled cutover in which the originating instance and a replicating instance switch roles. Such a step is necessary to test cutover or to bring the originating instance down for maintenance. In the following steps, A is the former originating instance and new replicating instance, while B is the former replicating instance and new originating instance.

On A:

On B:

Network Failures

Perform a cutover from the originating instance to a replicating instance if the network from clients to the originating instance fails, and the network from the clients to the replicating instance is still functioning.

If the network from clients to both the originating and all available replicating instances fails, the application is no longer available.

If the network between the originating instance and replicating instance fails, no action is required to manage GT.M replication. The originating instance continues to make the application available. Replicating instances will catch up with the originating instance when the network is restored.

If the network from the clients to the replicating instance fails, no action is required to manage GT.M replication, although it would be prudent to make the network operational as soon as possible.

Single-Site Failure

Replicating Instance Fails

When the replicating instance comes back up after failure, it will still be the replicating instance. Refer to the preceding "Secondary Starts After a Shut Down or Crash" description for further details.

Originating instance Fails

If the originating instance fails, the replicating instance should take over; when the originating instance comes back up, it should come up as the new replicating instance.

On new originating instance (former replicating instance)

When the new replicating instance (the former originating instance) comes back up, query the originating instance for the journal sequence number at which it became the originating instance (refer to Displaying/Changing the attributes of Replication Instance File and Journal Pool for more information), and roll back the replicating instance to this point (refer to Section : “Rolling Back the Database After System Failures for more information). Reconcile the transaction from the lost transaction file or possibly the broken transaction file resulting from the rollback operation.

  • Create new journal files.

  • Start the Source Server in passive mode.

  • Start the Receiver Server to resume replication as the new replicating instance. Dual-site operation is now restored.

  • As appropriate, start the passive application servers.

Various dual-site failure scenarios are possible. Each case is a full system outage - that is, the application is no longer available (which is why multi-site may be attractive). This section identifies the recovery mechanism for each scenario. In each scenario, Site A is initially the originating instance and Site B is initially the replicating instance, before any failure.

Replicating Instance (Site B) Fails First

In the following scenarios, the replicating instance fails first. Once the replicating instance fails, the originating instance continues to operate. The originating instance’s replication Source Server is unable to send updates to the replicating instance. Thus, there is a queue of non-replicated transactions at the failure point on the originating instance. Then, the originating instance fails before the replicating instance recovers, which leads to a dual-site outage. Operating procedures differ according to which site recovers first.

Site A recovers first

On Site A:

On Site B, when it recovers:

Site B recovers first

On Site B:

On Site A, when it recovers:

Originating Instance (Site A) Fails First

In the following scenarios, the originating instance fails first, causing a cutover to Site B. Site B operates as the originating instance and then fails.

Site B Recovers First

On Site B:

  • Roll back the database to the last committed transaction (last application-consistent state).

  • Create new journal files.

  • Start the Source Server.

  • Start the application servers. Application availability is now restored. If the state of the database indicates that batch operations were in process, restart batch operations.

On Site A, when it recovers:

  • Query the originating instance (refer to Section : “Displaying/Changing the attributes of Replication Instance File and Journal Pool for more information) for the journal sequence number at which it became primary, and the rollback the replicating instance to this point. Then apply the lost transaction files to the originating instance for reconciliation/reapplication. The broken transaction file contains information about transactions that never completed and may be useful in researching where work should be restarted. However, broken transactions, unlike the lost transactions, are inherently incomplete.

  • Create new journal files.

  • Start the Source Server in passive mode.

  • Start the Receiver Server to resume replication as the new secondary. Dual-site operation is now restored.

  • Start the passive application servers, as appropriate.

Site A Recovers First

On Site A:

  • Roll back the database to the last committed transaction (last application-consistent state).

  • Create new journal files.

  • Start the Source Server.

  • Start the application servers. Application availability is now restored.

  • If the state of the database indicates that batch operations were in process, restart batch operations.

On Site B, when it recovers:

  • Roll back all transactions that were processed when it was the primary. Transmit the transactions backed out of the database by the rollback to the originating instance for reconciliation/reapplication.

  • Create new journal files.

  • Start the Source Server in passive mode.

  • Start the Receiver Server to resume replication as the new secondary. Dual-site operation is now restored.

  • Start the passive application servers, as appropriate.

A rolling software upgrade provides continuous service while applying a new software release to the system. In other words, a rolling software upgrade removes the need for application downtime as while each system is updated with the new software release independently, the other system acts as the originating instance during that period.

Dual-Site Rolling Upgrade

This rolling upgrade sequence assumes that there is only one replicating instance, which is down during the upgrade. Thus, if the originating instance goes down during the upgrade, the application becomes unavailable.

[Note]Note

A properly set LMS configuration avoids this exposure.

Assuming that Site A is the originating instance before the upgrade, here are the steps to perform a rolling upgrade:

On Site B:

On Site A:

Rolling SI Replication Upgrade

Although V5.5-000 supports only one source stream for SI replication, the architecture allows for fifteen externally sourced streams (numbers 1 through 15), with stream 0 being locally generated updates. Under normal conditions, on a supplementary instance, one will see updates from streams 0 and 1. If the recipient of an SI stream has been moved from receiving replication from one source to another, you may see other stream numbers (corresponding to updates from other streams) as well.

When adding SI replication, the rules to remember are that (a) both source and receiver sides of SI replication must be V5.5-000, (b) upgrading an instance to V5.5-000 requires a new replication instance file because the replication instance file format for SI is not compatible with those of prior releases and (c) the -updateresync qualifier requires the name of a prior replication instance file when both source and receiver are V5.5-000.

Remember that except where an instance is an unreplicated sole instance, you should upgrade replicating secondary instances rather than originating primary instances. Starting with BC replication (e.g., Ardmore as originating primary and BrynMawr as replicating secondary), the simplest steps to start SI replication to Malvern are:

At your convenience, once BrynMawr is upgraded you can:

To upgrade the replication instance file, perform the following steps:

The -updateresync qualifier indicates that instead of negotiating a mutually agreed common starting point for synchronization the operator is guaranteeing the receiving instance has a valid state that matches the source instance currently or as some point in the past. Generally this means the receiving instance has just been updated with a backup copy from the source instance.

[Note]Note

A GT.M V5.5-000 instance can source a BC replication stream to or receive a BC replication stream from older GT.M releases, subject to limitations as discussed in the Limitations section of this document. It is only for SI replication that both source and recipient must both be V5.5-000.

A supplementary instance cannot be the first or sole instance that you upgrade to V5.5-000 - you must already have created an instance running V5.5-000 to provide a replication stream to the supplementary instance.

You can create a supplementary instance from (a) a backup copy of another instance, a supplementary instance, an originating primary or replicating secondary by giving it a new identity, or (b) a freshly created, new instance. An instance used to create a supplementary instance must already be upgraded to V5.5-000.

Starting with a backup of another instance, follow the procedures above under Upgrade Replication Instance File using the -supplementary flag to the mupip replicate -instance_create command.

Creating a supplementary instance from a backup of an existing instance creates an SI replication instance with all the database state in the existing instance and is perhaps the most common application need. But there may be situations when a supplementary instance only needs new data, for example if it is to provide a reporting service only for new customers, and historical data for old customers is just excess baggage. In this case, you can also create a supplementary instance by creating a new instance, pre-loading it with any data required, enabling replication for the database files (since an Update Process will only apply updates to a replicated - and journaled - database region), and following the procedures above under above under Upgrade Replication Instance File using the -supplementary=on flag to the mupip replicate -instance_create command. Ship a replication instance file from the source to provide the -updateresync=filename qualifier required when starting the Receiver Server for the first time.

For starting SI replication on an originating primary supplementary instance P in a B<-A->P-<Q configuration, the procedure is similar to a non-supplementary instance except that one also needs to start a receiver server (after having started a source server to set up the journal pool) to receive updates from the corresponding non-supplementary instance (A or B in this case). Similarly, as part of shutting down the originating primary supplementary instance, an added step is to shut down the Receiver Server (if it is up and running) before shutting down the Source Server.

Remember that for GT.M replication, the Receiver Server listens at a TCP port and the Source Server connects to it. If the Receiver Server is not ahead of the Source Server, replication simply starts and streams updates from the source to the receiver. When the Receiver Server is ahead of the Source Server, the cases are different for BC and SI replication.

For either BC or SI replication, if the Receiver Server is started with the -autorollback qualifier, it performs an online rollback of the receiving instance, so that it is not ahead of the originating instance, creating an Unreplicated Transaction Log of any transactions that are rolled off the database. When started without the -autorollback qualifier, a Receiver Server notified by its Source Server of a rollback, logs the condition and exits so an operator can initiate appropriate steps.

For SI replication the -noresync qualifier tells the Receiver Server not to rollback the database even if the receiver is ahead of the source. In this case, the Source Server starts replication from the last journal sequence number common to the two instances.

If you notice the replication WAS_ON state, correct the cause that made GT.M turn journaling off and then execute MUPIP SET -REPLICATION=ON.

To make storage space available, first consider moving unwanted non-journaled and temporary data. Then consider moving the journal files that predate the last backup. Moving the currently linked journal files is a very last resort because it disrupts the back links and a rollback or recover will not be able to get back past this discontinuity unless you are able to return them to their original location.

If the replication WAS_ON state occurs on the originating side:

If the Source Server does not reference any missing journal files, -REPLICATION=ON resumes replication with no downtime.

If the Source Server requires any missing journal file, it produces a REPLBRKNTRANS or NOPREVLINK error and shuts down. Note that you cannot rollback after journaling turned off because there is insufficient information to do such a rollback.

In this case, proceed as follows:

If the replication WAS_ON state occurs on the receiving side:

Execute MUPIP SET -REPLICATION=ON to return to the replication ON state. This resumes normal replication on the receiver side. As an additional safety check, extract the journal records of updates that occurred during the replication WAS_ON state on the originating instance and randomly check whether those updates are present in the receiving instance.

If replication does not resume properly (due to errors in the Receiver Server or Update Process), proceed as follows:

When replication is ON, GT.M increments the Region Seqno field in the database file header for every update that occurs on a region. When replication is OFF, database updates do not affect the Region Seqno field. Except step 4, no other step affects the Region Seqno field. When you set the Region Seqno in step 4, you instruct GT.M to resume receiving replication from point replication turned off in step 1.

tlsrepl.tar.gz contains scripts that create two databases and sets up a secure (TLS/SSL) replication connection between them. Please click Download tlsrepl.tar.gz to download tlsrepl.tar.gz and follow the below instructions. You can also download tlsrepl.tar.gz from http://tinco.pair.com/bhaskar/gtm/doc/books/ao/UNIX_manual/tlsrepl.tar.gz. The scripts in tlsrepl.tar.gz are solely for the purpose of explaining the general steps required to encrypt replication data in motion. You must understand, and appropriately adjust, the scripts before using them in a production environment. The following steps create two instances and a basic framework required for setting up a TLS replication connection between them. Note that all certificates created in this example are for the sake of explaining their roles in a TLS replication environment. For practical applications, use certificates signed by a CA whose authority matches your use of TLS.

On instance Helen:

  1. Specify the value of $gtm_dist and $gtmroutines in the env script and execute the following command:

    $ source ./env Helen

    This creates a GT.M environment for replication instance name Helen. When prompted, enter a password for gtmtls_passwd_Helen.

  2. Create a certificate directory setup in the current directory and a self-signed root certification authority.

    $ ./cert_setup

    This creates a self-signed root certification authority (ca.crt) in $PWD/certs/. ca.crt needs to be available on both Source and Receiver Servers. In a production environment, ensure that you keep root-level certificates in directories with 0500 permissions and the individual files with 0400 permissions so that unauthorized users cannot access them.

    The output of this script is like the following"

    Generating RSA private key, 512 bit long modulus
    ...........++++++++++++
    ...............++++++++++++
    e is 65537 (0x10001)
    Enter pass phrase for /home/jdoe/certs/ca.key:
    Verifying - Enter pass phrase for /home/jdoe/certs/ca.key:
    Enter pass phrase for /home/jdoe/certs/ca.key:
    You are about to be asked to enter information that will be incorporated
    into your certificate request.
    What you are about to enter is what is called a Distinguished Name or a DN.
    There are quite a few fields but you can leave some blank
    For some fields there will be a default value,
    If you enter '.', the field will be left blank.
    Country Name (2 letter code) [US]:US
    State or Province Name (full name) [Philadelphia]:Philadelphia
    City (e.g., Malvern) [Malvern]:Malvern
    Organization Name (eg, company) [FIS]:FIS
    Organizational Unit Name (eg, section) [GT.M]:GT.M
    Common Name (e.g. server FQDN or YOUR name) [localhost]:fis-gtm.com
    Email Address (e.g. helen@gt.m) []:root@gt.m
  3. Create the global directory and the database for instance Phil.

    $ ./db_setup

    This also creates a leaf-level certificate for Helen. The openssl.conf used to create this example in available in the download for your reference. Ensure that your openssl.cnf file is configured according to your environment and security requirements.

    The output of this script is like the following:

    Generating RSA private key, 512 bit long modulus
    .++++++++++++
    .......++++++++++++
    e is 65537 (0x10001)
    Enter pass phrase for /home/jdoe/certs/Helen.key: [Specify the password that you entered for gtmtls_passwd_Helen]
    Verifying - Enter pass phrase for /home/jdoe/certs/Helen.key:
    Enter pass phrase for /home/jdoe/certs/helen.key:
    You are about to be asked to enter information that will be incorporated
    into your certificate request.
    What you are about to enter is what is called a Distinguished Name or a DN.
    There are quite a few fields but you can leave some blank
    For some fields there will be a default value,
    If you enter '.', the field will be left blank.
    Country Name (2 letter code) [US]:
    State or Province Name (full name) [Philadelphia]:Illinois
    City (e.g., Malvern) [Malvern]:Chicago
    Organization Name (eg, company) [FIS]:FIS
    Organizational Unit Name (eg, section) [GT.M]:GT.M
    Common Name (e.g. server FQDN or YOUR name) [localhost]:fisglobal.com
    Ename Address (e.g. helen@gt.m) []:helen@gt.m
    
    Please enter the following 'extra' attributes
    to be sent with your certificate request
    A challenge password []:
    An optional company name []:
    Using configuration from /usr/lib/ssl/openssl.cnf
    Enter pass phrase for ./certs/ca.key:
    Check that the request matches the signature
    Signature ok
    Certificate Details:
     Serial Number: 14 (0xe)
     Validity
     Not Before: Jun 11 14:06:53 2014 GMT
     Not After : Jun 12 14:06:53 2014 GMT
     Subject:
     countryName = US
     stateOrProvinceName = Illinois
     organizationName = FIS
     organizationalUnitName = GT.M
     commonName = fisglobal.com
     emailAddress = helen@gt.m
     X509v3 extensions:
     X509v3 Basic Constraints:
     CA:FALSE
     Netscape Comment:
     OpenSSL Generated Certificate
     X509v3 Subject Key Identifier:
     96:FD:43:0D:0A:C1:AA:6A:BB:F3:F4:02:D6:1F:0A:49:48:F4:68:52
     X509v3 Authority Key Identifier:
     keyid:DA:78:3F:28:8F:BC:51:78:0C:5F:27:30:6C:C5:FE:B3:65:65:85:C9
    
    Certificate is to be certified until Jun 12 14:06:53 2014 GMT (1 days)
    Sign the certificate? [y/n]:y
    
    
    1 out of 1 certificate requests certified, commit? [y/n]y
    Write out database with 1 new entries
    Data Base Updated
  4. Create the $gtmcrypt_config file. Specify the names of all participating instances.

    ./gen_gc Helen Phil
  5. Turn replication on and create the replication instance file:

    $ ./repl_setup
  6. Start the originating instance Helen:

    $ ./originating_start Helen Phil

On instance Phil:

  1. Specify the value of $gtm_dist and $gtmroutines in the env script and execute the following command:

    $ source ./env Phil

    This creates a GT.M environment for replication instance name Phil. When prompted, enter a password for gtmtls_passwd_Phil.

  2. Create the global directory and the database for instance Phil.

    $ ./db_setup

    This also creates a leaf-level certificate for Phil. The openssl.conf used to create this example in available in the download for your reference. Ensure that your openssl.cnf file is configured according to your environment and security requirements.

  3. Create the $gtmcrypt_config file. Specify the names of all participating instances.

    ./gen_gc Helen Phil

    Ensure that ca.key generated on Instance Helen is available on instance Phil.

  4. Create the $gtmcrypt_config file. Specify the names of all participating instances.

    $ ./gen_gc Helen Phil 
  5. Turn replication on and create the replication instance file:

    $ ./repl_setup
  6. Start the replicating instance Phil.

    $ ./replicating_start 

For subsequent environment setup, use the following commands:

source ./env Phil or source ./env Helen
./replicating_start or ./originating_start Helen Phil


[4] The reason for the "almost completely" qualification is that the time to recover some older file systems can depend on the amount of space used.

loading table of contents...