Transaction Processing

Transaction Processing (TP) provides a way for M programs to organize database updates into logical groups that occur as a single event (i.e., either all the database updates in a transaction occur, or none of them occur). No other process may behave as if it observed any intermediate state.

Transaction processing has been designed to improve output and eliminate "live lock" conditions. The number of attempts to complete the transaction is limited to four. The fourth attempt is made inside a "critical section" with all other processes temporarily locked out of the database. Between the second and third tries, GT.M waits for a random interval between 0 and 500 milliseconds.

TP Definitions

In M, a transaction is a sequence of commands that begins with a TSTART command, ends with a TCOMMIT command, and is not within the scope of another transaction.

A successful transaction ends with a COMMIT that is triggered by the TCOMMIT command at the end of the transaction. A COMMIT causes all the database updates performed within the transaction to become available to other processes.

An unsuccessful transaction ends with a ROLLBACK. ROLLBACK is invoked explicitly by the TROLLBACK command, or implicitly at a process termination that occurs during a transaction in progress. An error within a transaction does not cause an implicit ROLLBACK. A ROLLBACK removes any database updates performed within the transaction before they are made available to other processes. ROLLBACK also releases all resources LOCKed since the start of the transaction, and makes the naked reference undefined.

A RESTART is a transfer of control to the TSTART at the beginning of the transaction. RESTART implicitly includes a ROLLBACK and may optionally restore local variables to the values they had when the initial TSTART was originally executed. A RESTART always restores $TEST and the naked reference to the values they had when the initial TSTART was executed. RESTART does not manage device state information. A RESTART is invoked by the TRESTART command or by M if it is determined that the transaction is in conflict with other database updates. RESTART can only successfully occur if the initial TSTART includes an argument that enables RESTART.

TP Characteristics

Most transaction processing systems try to have transactions that meet the "ACID" test – Atomic, Consistent, Isolated, and Durable.

To provide ACID transactions, GT.M uses a technique called optimistic concurrency control. Each block has a transaction number that GT.M sets to the current database transaction number when updating a block. Application logic, brackets transactions with TSTART and TCOMMIT commands. Once inside a transaction, a GT.M process tracks each database block that it reads (any database block that it intends to update has to be read first) and in process private memory keeps a list of updates that it intends to apply - application logic within the process views the database with the updates; application logic in other processes does not see states internal to the transaction. At TCOMMIT time, the process checks whether any blocks have changed since it read them, and if none have changed, it commits the transaction, making its changes visible to other processes Atomically with Isolation and Consistency (Durability comes from the journal records written at commit time).

If one or more blocks have changed, the process reverts its state to the TSTART and re-executes the application code for the transaction. If it fails to commit the second time, it tries yet again. If it fails to commit on the third attempt, it locks other processes out of the database and executes the transaction as the sole process (that is, on the fourth attempt, it switches to a from an optimistic approach to a pessimistic one).

This technique normally works very well and is one of the factors that allow GT.M to excel at transaction processing throughput.

Pathological cases occur when processes routinely modify blocks that other processes have read (called "collisions"), resulting in frequent transaction restarts. Collisions can be legitimate or accidental. Importantly, the longer that a transaction is "open" (the "collision window," when the application logic is between TSTART and TCOMMIT), the greater the probability that a collision will require a transaction restart.

Legitimate collisions can result from normal business activity, for example, if two joint account holders make simultaneous ATM withdrawals from a joint account. When the time an application takes to process each transaction is a minuscule fraction of a second, the probability of a collision is very low, and in the rare case where one occurs, the restart mechanism handles it well. An example with a higher probability of collision comes from commercial accounts, where a large enterprise may have tens to hundreds of accounts, individual transactions may hit multiple accounts, and during the business day many people may execute transactions against those accounts. Again, the small collision window means that collisions remain rare and the restart mechanism handles them well when they occur.

Legitimate (from a GT.M point of view) collisions can also occur as a consequence of application design. For example, if an application has an application level transaction journal that every process appends to then that design will likely result in high rates of collisions, creating a pathological case where every transaction fails three times and then commits on the fourth attempt with all other processes locked out. The way to avoid these is to adjust the application design, either to use M LOCKs to gate such "hot spots" or, better, to give each process its own update space which, at some event, a single process then consolidates.

Accidental collisions result when two processes access unrelated data that happens to reside on the same data block (for example, some global indexed by last name can result in an accidental collision if two account holders whose last names start with the same letter, the global data nodes may reside in the same block). Because the path to many data blocks typically pass though one index block, data additions cause changes in index blocks and can generate accidental collisions. While it is not possible to avoid accidental collisions (especially in blocks containing metadata such as index blocks), they are rare and the occasional collision is handled well by the restart mechanism.

Application design that keeps transactions open for long periods of time can cause pathological rates of accidental collision. When a process tries to run an entire report in a transaction, instead of the transaction taking a fraction of a second (remember that transactions are intended to be atomic), the report takes seconds or even minutes and effectively ensures collisions and restarts. Furthermore, since the probability of collisions is high, the probability of these long-running transactions executing the fourth retry (with other processes shut out) goes up, and when that happens, the system appears to respond erratically, or hang temporarily.

GT.M provides a transaction timeout feature that interrupts long-running transactions in order to limit their impact on the system, and the consequent user perception of system erratic response times and temporary hangs. Calls to an external library, say to access a web service, can subvert the timeout mechanism when the external library uses an uninterruptable system call. If such a web service uses an adjacent server that responds immediately, the web service is wholesome. But if the web service accesses a remote server without a guaranteed short response time, then collisions may be frequent, and if a process in the fourth retry waits for a web service that never responds, it brings the entire application to a standstill.

[Tip] Implementing Web Services Safely

To safely implement web services inside a transaction, an application must implement a guaranteed upper bound on the time taken by the service. The story or use case for each circumstance determines the appropriate timeout for the corresponding transaction. For example, if the web service is to authorize a transaction, there might be a 500 millisecond timeout with the authorization refused if the approval service does not respond within that time.

There are two approaches to implementing web services with a timeout.

  1. For applications that call out to C code, the C code guarantee a return within a time limit, using a wrapper if necessary. GT.M provides functions that external C code can use to implement timers. If the call is to an unknown library, or one without a way to guarantee a timeout, the external C code may need to create an intermediate proxy that can provide a timeout to GT.M.

  2. Because web services are usually implemented by a known protocol layered on TCP/IP and GT.M provides a SOCKET device for TCP/IP connections, implement the call out to the web service using a GT.M SOCKET device. GT.M can then enforce the TP timeout mechanism, which it cannot for an external call, especially one that calls via a library into an uninterruptible OS service.

To conform with the M approach of providing maximum flexibility and, when possible, backwards compatibility with older versions of the standard, M transaction processing requires the use of programming conventions that meet the ACID test.

For example, some effects of the BREAK, CLOSE, JOB, OPEN, READ, USE WRITE, and ZSYSTEM commands may be observed by parties to the system. Because the effects of these commands might cause an observing process or person to conclude that a transaction executing them was in progress and perhaps finished, they violate, in theory, the principle of Isolation.

The LOCK command is another example. A program may attempt to use a LOCK to determine if another process has a transaction in progress. The answer would depend on the management of LOCKs within transactions, which is implementation-specific. This would therefore clearly violate the principle of Isolation. The LOCK command is discussed later in this section.

The simplest way to construct a transaction that meets the ACID test is not to use any commands within a transaction whose affects may be immediately "visible" outside the transaction. Unfortunately, because M applications are highly interactive, this is not entirely straightforward. When a user interaction relies on database information, one solution is for the program to save the initial values of any global values that could affect the outcome, in local variables. Then, once the interaction is over and the transaction has been initiated, the program checks the saved values against the corresponding global variables. If they are the same, it proceeds. If they differ, some other update has changed the information, and the program must issue a TROLLBACK, and initiate another interaction as a replacement.

Even when the "visible" commands appear within a transaction, an M application may provide wholesome operation by relying on additional programming or operating conventions.

A program using LOCKs to achieve serializability relies on properly designed and universally followed LOCKing conventions to achieve Isolation with respect to database operations. LOCKs placed outside the transaction (usually a LOCK immediately before the TSTART and an unlock immediately after the TCOMMIT) achieve serializability by actually serializing any approximately concurrent transaction. LOCKs placed inside the transaction (frequently a LOCK immediately after the TSTART and an unlock immediately before the TCOMMIT) signal M to ensure that no operations using the same LOCK resource(s) overlap. Within a transaction, an M implementation may defer both LOCKing and unlocking to achieve its goal of serializability. A program using TSTARTs with the SERIAL keyword replaces the convention with a guarantee from M that all the database activity of the transaction meets the test of Isolation with respect to database activity.

In GT.M the Durability aspect of the ACID properties relies on the journaling feature. When journaling is on, every transaction is recorded in the journal file as well as in the database. The journal file constitutes a serial record of database actions and states. It is always written before the database updates and is designed to permit recovery of the database if the database should be damaged. By default when a process commits a transaction, it does not return control to the application code until the transaction has reached the journal file. The exception to this is that when the TSTART specifies TRANSACTIONID="BATCH" the process resumes application execution without waiting for the file system to confirm the successful write of the journal record. The idea of the TRANSACTIONID="BATCH" has nothing inherently to do with "batch" processing - it is to permit maximum throughput for transactions where the application has its own check-pointing mechanism, or method of recreating the transaction in case of a failure. The real durability of transactions is a function of the durability of the journal files. Putting journal files on reliable devices (RAID with UPS protection) and eliminating common points of failure with the path to the database (separate drives, controllers cabling) improve durability. The use of the replication feature can also improve durability by moving the data to a separate site in real time.

Attempting to QUIT (implicitly or explicitly) from code invoked by a DO, XECUTE, or extrinsic after that code issued a TSTART not yet matched by a TCOMMIT, produces an error. Although this is a consequence of the RESTART capability, it is true even when that capability is disabled. For example, this means that an XECUTE containing only a TSTART fails, while an XECUTE that performs a complete transaction succeeds.

TP Performance

To achieve the best GT.M performance, transactions should:

  • be as short as possible

  • consist, as much as possible, only of global updates

  • be SERIAL with no associated LOCKs

  • have RESTART enabled with a minimum of local variables protected by a restart portion of the TSTART argument.

  • Large concurrent transactions using TCOMMIT may result in repeated and inefficient attempts by competing processes to capture needed scarce resources, resulting in poor performance.

Example:

TSTART ():SERIAL
SET (ACCT,^M(0))=^M(0)+1
SET ^M(ACCT)=PREC,^PN(NAM)=ACCT
TCOMMIT

This transaction encapsulates these two SETs. The first increments the tally of patients registered, storing the number in local variable ACCT for faster access in the current program, and in global variable ^M(0). The second SET stores a patient record by account number and the third cross-references the account number with the patient name. Placing the SETs within a single transaction ensures that the database always receive either all of the SETs or none of them, thus protecting database integrity against process or system failure. Similarly, another concurrent process, whether using transactions or not, never finds one of the SETs in place without also finding the other one.

Example:

TSTART ():SERIAL
IF $TRESTART>3 DO QUIT
.TROLLBACK
.WRITE !,"Too many RESTARTs"
.QUIT
SET (NEXT,^ID(0))=^ID(0)+1
SET ^ID(NEXT)=RECORD,^XID(ZIP,NEXT)=""
TCOMMIT

This transaction will automatically restart if it cannot serialize the SETs to the database, and will terminate with a TROLLBACK if more than 3 RESTARTs occur.

GT.M provides a way to monitor transaction restarts by reporting them to the operator logging facility. If the environment variable gtm_tprestart_log_delta is defined, GT.M reports every Nth restart where N is the numeric evaluation of the value of gtm_tprestart_log_delta. If the environment variable gtm_tprestart_log_first is defined, the restart reporting begins after the number of restarts specified by the value of gtm_tprestart_log_first. For example, defining both the environment variable to the value 1, causes all TP restarts to be logged. When gtm_tprestart_log_delta is defined, leaving gtm_tprestart_log_first undefined is equivalent to giving it the value 1.

[Note] Note

For more information on enhancements related to TP performance see the "NOISOLATION" section under the VIEW command topic in Chapter 6: “Commands.

TP Example

Here is a transaction processing example that lets you exercise the concept. If you use this example, be mindful that the functions "holdit" and "trestart" are included as tools to allow you access to information within a transaction which would normally be hidden from users. These types of functions would not normally appear in production code. Comments have been inserted into the code to explain the function of various segments.

trans
;This sets up the program constants
;for doit and trestart
n
 s $p(peekon,"V",51)=""
 s $p(peekon,"V",25)="Peeking inside Job "_$j
 s $p(peekoff,"^",51)=""
 s $p(peekoff,"^",25)="Leaving peeking Job "_$j
 ;This establishes the main loop
 s CNFLTMSG="Conflict, please reenter"
 f r !,"Name: ",nam q:'$l(nam) d
 .i nam="?" d q
 ..w !,"Current data in ^trans:",! d:$d(^trans) q
 ...zwrite ^trans
 .f s ok=1 d q:ok w !,$C(7),CNFLTMSG,$C(7),!
 ..s old=$g(^trans(nam),"?")
 ..i old="?" w !,"Not on file" d q
 ...;This is the code to add a new name
 ...f d q:data'="?"
 ....r !,"Enter any info using '#' delimiter: ",!,data 
 ...i data="" w !,"No entry made for ",nam q
 ...TSTART ():SERIAL i $$trestart ;$$trestart for demo
 ...i $d(^trans(nam)) s ok=^trans(nam)=data TRO q
 ...s ^trans(nam)=data
 ...TCOMMIT:$$doit ;$$doit for demo
 ..;This is the beginning of the change and delete loop
 ..f d q:fld=+fld!'$l(fld) w " must be numeric"
 ...w !,"Current data: ",!,old
 ...r !,"Piece no. (negative to delete record) : ",fld
 ..i 'fld w !,"no change made" q
 ..;This is the code to delete a new name
 ..i fld<0 d q ; delete record
 ...f d q:"YyNn"[x
 ....w !,"Ok to delete ",nam," Y(es) or N(o) <N>? " 
 ....r x s x=$e(x)
 ...i "Yy"'[x!'$l(x) w !,"No change made" q
 ...TSTART ():SERIAL i $$trestart ;$$trestart for demo
 ...i $g(^trans(nam),"?")'=old TROLLBACK s ok=0 q
 ...kill ^trans(nam)
 ...TCOMMIT:$$doit; $$doit for demo
 ..;This is the code to change a field
 ..f r !,"Data: ",data q:data'="?"&(data'["#") d
 ...w " must not be a single '?' or contain any '#'"
 ..TSTART ():SERIAL i $$trestart ;$$trestart for demo
 ..i '$d(^trans(nam)) s ok=0 TROLLBACK q
 ..i $p(^trans(nam),"#",fld)=$p(old,"#",fld) d q
 ...s ok=$p(^trans(nam),"#",fld)=data TROLLBACK
 ..s $p(^trans(nam),"#",fld)=data
 ..TCOMMIT:$$doit; $$doit for demo
 q
doit()
;This inserts delay and an optional 
;rollback only to show how it works
 w !!,peekon d disp
 f d q:"CR"[act
 .r !,"C(ommit), R(ollback), or W(ait) <C>? ",act
 .s act=$tr($e(act),"cr","CR")
 .i act="?" d disp
 i act="R" TROLLBACK w !,"User requested DISCARD"
 w !,peekoff,!
 q $TLEVEL

trestart()
;This is only to show what is happening
 i $TRESTART d
 .w !!,peekon,!,">>>RESTART<<<",! d disp w !,peekoff,!
 q 1

disp
 w !,"Name: ",nam
 w !,"Original data: ",!,old,!,"Current data: "
 w !,$g(^trans(nam),"KILLED!")
 q

Generally, this type of program would be receiving data from multiple sessions into the same global.