23.5.2 Logging Mechanism

During normal operation, the Logging Mechanism records the state and actions of a member of an object group in a log, as shown in Figure 23-12 on page 23-83. The state and actions correspond to messages sent and received by the member of the object group. Conceptually, the Fault Tolerance Infrastructure maintains a distinct log for each object group, although it may record the logs for many object groups within the same physical log. The log may be distributed, in which case it is maintained in local volatile storage at each member of the object group that is the destination of the message. The distributed logging strategy typically employs a reliable totally-ordered multicast protocol to deliver the messages to all of the members of the object group. Alternatively, particularly for passively replicated object groups, the log may be written to shared stable storage by the primary member of the object group that is the source of the message. To be sound, the shared logging strategy requires that each message is forced to the log on stable storage before it is transmitted, which may have an adverse effect on performance.

The format of the log is not specified in this specification. Typically, the information recorded in the log consists of request and reply messages, and states and updates in the form of get_state() and get_update() request and reply messages, as shown in Figure 23-12 on page 23-83. The log must preserve the order in which messages were received by the members of the object group, so that they can be replayed in the correct order during recovery. States and updates must be positioned logically in the message sequence at the point at which they were requested by the get_state() or get_update() request message, even though the state or update may be contained in a reply message that is sent at a later time. A complete state consists of the get_state() request message and the reply to that request. A complete update is defined similarly.

To conserve memory, the Logging Mechanism must prune the log of records that the Recovery Mechanism will not subsequently require for recovery. Thus, if the log contains a complete state, the Logging Mechanism can discard all log records prior to the get_state() request message for that state. Similarly, if a log contains a complete update, the Logging Mechanism can discard all request and reply messages, other than those associated with the logging of a state or update, that precedes the get_update() request message for that update. If, however, a request contains an FT_REQUEST service context, which defines an expiration time for the request, the request and its matching reply must be retained until that expiration time.