Non-IBM Disclaimer

The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions.

Sunday, August 19, 2018

DataPower High Availability support overview

Introduction
Nowadays, many organizations can no longer exist in a "silo" mode. To keep up with extremely fast changing world they are required to allow bidirectional integration with clients and partners for both consuming 3rd party APIs and exposing their-own. Many of them choose IBM DataPower Gateways to keep these integrations secured. Any outage in the SOA middleware infrastructure might cause companies to lose business, clients and credibility. To keep the ball rolling most companies implement High Availability across in the stuck to eliminate single point of failure by adding redundancy to the system, so that failure of any single component does not compromise the entire system. This post describes High Availability patterns supported by the DataPower appliances for inbound transactions.

HA option 1 - Standby Control
DataPower Standby Control feature is based on VRRP technology,  allows operation of two or more DataPower appliances in Active/Standby mode. Standby Control provides HA between two different network interfaces; is part of the NIC configuration; and allows interface failover to the standby NIC (usually located on another appliance). NIC with the highest priority is the active NIC, it owns the  the VIP, receives all inbound requests, processes them and returns results. All other NICs in the HA group are in a standby mode; configured with lower priority; waiting for the active NIC to fail to take over the VIP.
This feature is included with all DataPower appliances and doesn't require purchasing any additional licenses. In case of virtual appliances, docker appliances, and appliances running in cloud, in some cases it is possible to setup Standby Control, however, there are some restrictions that should be addressed prior proceeding with this option. It is possible to have this HA pattern configured on two different NICs on the same appliance, in older firmware versions, however, it wasn't supported.

HA option 2 - Self Balancing
Standby Control is nice, but it doesn't scale. And that's exactly the benefit of Self balancing pattern. Self balancing is provided by DataPower Option for Application Optimization. AO is a field upgradable option that requires an additional license i.e. has it's own part number. Self balancing is based on the Standby Control feature and extends its capabilities to support active/active mode. One NIC is the Distributor, it owns the VIP, receives all inbound requests and distributes them for further processing among all other NICs (appliances) in a group, including itself.

HA option 3 - External Network Load Balancer
In this architecture HA is configured, deployed and operates on external NLB. The configuration and features are dependent on the NLB vendor and therefore won't be described as part of this technote.

Summary
What would be a good fit for your organization? It depends on many different aspects. The Standby Control is a great option and a good starting point especially with new DataPower implementations. If you require active/active operation, using external NLB might seem as a very straightforward approach, especially in organizations with adopted NLB solution. Indeed, it has many advantages:
  1. It allows active/active mode without purchasing additional DataPower license for AO.
  2. Existing and skilled Load Balancer team is there to assist with any issue.
  3. Procedures and runbooks are written and in use.
  4. Having NLB implemented as a reverse-proxy, would add another level of security.
  5. In addition, you'd prefer to keep all load balancing practice in one place, wouldn't you?
However, in the long run this approach might introduce some challenges. For example:

  1. Working in Active-Active topology might cause the troubleshooting process last longer and be more complicated, as now there are two or more DataPower Gateways that each of them could potentially run the transaction in question. This risk can be managed by using DataPower Operations Dashboard.
  2.  The inbound transactions source IP would now be the external NLB IP. Will that impact your troubleshooting techniques and authentication policies? Would it be possible to disable the Source IP NAT on the NLB?
  3. Depending on the NLB vendor, healthcheck requirements and the chosen implementation, each DataPower FSH might be required to be represented as a separate farm on NLB, thus making  DataPower development processes longer, dependent on other external resources, and, therefore, more complicated. Having in place a defined policy for FSH use would simplify this risk.
  4. Having network SSL/TLS encrypted, in some cases NLB should have the ability to inspect data (headers and payload), which might involve additional licensing costs for purchasing NLB SSL terminator component and impact on overall performance.
  5. If  SSL caching is implemented, adding NLB might cause problems.
  6. Adding another point of failure might make the troubleshooting process more complicated and longer.
What would be a best fit for your organization? Feel free to reach out with any questions. 

Bibliography
  1. Understanding Standby Control and Load Balancing for WebSphere DataPower SOA Appliances.
  2. Virtual Router Redundancy Protocol.
  3. DataPower option for Application Optimization.
  4. Standby control and self balancing enhancements as of firmware v6.