IBM Global Mailbox - High availability properties and limitations
High availability of Global Mailbox is limited during some situations, for example when a data center goes down or when one or more nodes in a data center are not operational.
- Protocol actions or trading partner actions (upload or download of files)
- Message processing
- Global Mailbox management node user interface
- Payload replication
- Command line utilities
Protocol action or trading partner action constitutes uploading or downloading a message or file to or from Global Mailbox.
- File Transfer Protocol (FTP)
- Secure File Transfer Protocol (SFTP)
Availability or unavailability of some Global Mailbox components impact the availability of protocol actions. The following sections explain the same in detail:
How Cassandra impacts availability of protocol actions
With delayed (
asynchronous) replication (default setting), 50% + 1 of the nodes in the data center where the action is performed must be available. For example, if your data center has 3 Cassandra nodes, then 50% +1 would be 1.5 + 1 = 2.5, which must be rounded off to 2. Therefore, 2 Cassandra nodes must be available for the protocol action to complete. The nodes in other data centers are not needed for the request to be successful.
Let us look at it with another example. If your data center has 4 Cassandra nodes, then 50% + 1 would be 2+1 = 3. Therefore 3 Cassandra nodes must be available for protocol action to complete.
With immediate (
synchronous) replication, 50% + 1 of the nodes across the data centers must be available, and two of the available nodes must be in the data center where a user is doing an upload or download. Consider an environment with 3 nodes in data center 1 and 3 nodes in data center 2. Totally, there are 6 nodes in both the data centers. To meet the 50%+1 requirement, 4 nodes are required. Out of the 4 nodes, 2 nodes must be in the data center where the user is uploading or downloading a file.
Immediate replication requires that the file be replicated to at least one other data center. To achieve this the other data center must have 50% + 1 nodes up and running. If the nodes are not available, the upload fails.
In both the scenarios, it is required to have 2 nodes in the data center where a user is performing an action. To ensure that Global Mailbox can survive the loss of a node in the data center, we must add an additional node. This means we must have a minimum of 3 nodes in each data center.
How ZooKeeper nodes impact availability of protocol actions
During normal functioning of the Global Mailbox, a minimum of 2 ZooKeeper servers must be operational across the data centers for trading partner action or protocol actions to complete. The ZooKeeper watchdog process must also be running on all ZooKeeper nodes so that ZooKeeper failures can be detected, and recovery with a reduced ensemble can be initiated. While the watchdog is setting up the reduced ensemble, there is a window of time where trading partner action fails. The action proceeds after the reduced ensemble is started. If the number of running ZooKeeper servers reduces to 1 or 0, all trading partner actions fail until the number of ZooKeeper servers comes back to 2 or more.
How shared disk impacts availability of protocol actions
When the Sterling B2B Integrator Global Mailbox Client Adapter starts, it needs to load the configuration files on the shared disk. If the shared disk fails before the configuration file is loaded, protocol actions cannot be completed. If the shared disk fails after the configuration file is loaded, all files regardless of any size will fail unless there is already an active session before the shared disk is unavailable.
How replication server and payload replication type impact the availability of protocol actions
If your replication type is delayed (
asynchronous), you can upload files of any size to the data center when the replication server is down in the local or other data centers. Files are replicated later when the replication server becomes operational.
synchronous), non-inline payloads must be replicated to at least one other data center for the upload to be successful. For example, if you have uploaded a file in data center 1, then the following components must be operational:
- At least one Global Mailbox management node in data center 2
- Shared disk in data center 2
- At least one replication server in data center 1
These components are required so that the Global Mailbox management node in data center 2 can pull the file into the shared disk in data center 2 (local shared disk). The Global Mailbox management node pulls the file by using the replication server in data center 1.
Mailboxes can be created by using a protocol because those tasks are independent of payload replication.
How WebSphere MQ impacts availability of protocol actions
When files are uploaded, a message is created on WebSphere® MQ that an event should be created. If WebSphere MQ is down, the message is not created and the upload fails.
File downloads are not impacted by WebSphere MQ availability.
Messages are processed in the data center in which they are received. A message or file need not be replicated before the event is raised.
- An event is raised on the queue in the data