When Traditional High Availability Is Not Good Enough

High Availability

In this blog, we will explore how one can provide a highly available key management and vHSM service, for the relevant cryptographic use cases, comparing Unbound pure-software technology to the legacy HSMs.

The digital revolution has transformed the landscape of business. Traditional high availability is no longer good enough; key applications must be accessible at all times for businesses to survive and thrive in today’s highly competitive and dynamic environment. Meeting these higher availability demands requires a well thought-out strategy that accounts for the increasing complexity of enterprise application infrastructures. Data centers and systems now span the globe, integrating disparate business processes. Designing your application infrastructure for continuous availability, therefore, begins with the architecture that must include all the underlying services, including cryptographic ones.

The goal of a High Availability (HA) architecture is to mitigate or prevent application downtime or outages due to failures caused by any type of infrastructure failure. Disaster recovery primarily deals with falling back on the secondary site in case of a failure at the primary site. With globalization and the Internet driving application access from all corners of the world, making applications available all the time is far more important than ever before.

Numerous applications rely on cryptographic functions at the back-end. In fact, practically every application that we are using in our daily activities, such as authentication to a service or an app, banking transactions, secure browsing or sending an email critically depends on the durability and availability of a cryptographic service.

99% is Not 100%

In the past, applications used to safeguard the encryption keys in Hardware Security Modules (HSM), a dedicated, rigid and inflexible HW cryptographic appliance.

The declared Mean Time Between Failures (MBTF) values of several legacy HSM vendors varies in the range of 5-40 years. The enormous standard deviation of this range (over 14 years), reflects much on the flawed prediction calculation methods that were used by these manufacturers. It is unrealistic to believe that HSM could last for 40 years even under an ideal operating conditions, not even mentioning that it wouldn’t be technologically relevant.

The availability of the HSM can be calculated using the following formula:

Availability Formula

This yields an overall availability of 95%-99%.

Eliminating the Single Point of Failure

Unbound Tech completely eliminates the single point of compromise for the most sensitive assets, ensuring keys and secrets are never kept whole (as they used to be protected inside HSMs in the past). Unbound implements multi-party computation (MPC) to create and use the fragmented secret without ever unifying it, in a method mathematically proven to be impossible to a breach or hack of any single location.

The Unbound Key Control (UKC) system is comprised from one or more pairs of standard servers that are installed and managed by the user. Each of these pairs is comprised of an Entry Point and a Partner. Together, they form the secure boundary of the UKC. To satisfy the minimum high availability requirements two pairs must be used, comprised of four servers.

Applications within the network connect to the entry point for consuming cryptographic services for the keys that are managed within the UKC.

UKC provides a solution with high availability, meaning that no single server failure stops UKC functionality. An aspect of high availability is the existence of a Disaster Recovery (DR) or Continuity of Business (COB) site that takes over once the main site fails. While such a site is not required to be online as long as the main site is functional, it does need to stay connected and data synced with the online system, so that it can take over as needed with up-to-date key material.

Measured UKC software server availability is 99.9%. Hence, the single pair UKC availability would be 99.8% (since Entry Point and Partner are operating in series). The following table demonstrates the availability of the UKC service per certain number of pairs running in parallel:

Number of pairsUKC cluster availability
199.8%
299.9996%
399.9999999984%

With just 2 UKC pairs (a total of 4 servers) one can reach an availability level typically feasible only for telecom grade equipment (between five and six nines).

With 3 UKC pairs (a total of 6 servers) availability level of IaaS/PaaS service is reached. UKC with 3 pair has 10.5 nines of availability (!), compared to AWS S3 with 11 nines.

Use Cases

The high-end level of key management and cryptographic keys availability is paramount for services serving a large number of end-users. Such cryptography consuming services include:

  • Code signing for a SaaS / large enterprise
  • Protecting SSL keys for hosted websites provider
  • Document signing for a SaaS / large enterprise
  • Securing payments transactions for a bank
  • PGP within an organization
  • IPsec for a telecom / SP network
  • Smart metering for a water / gas / electrical utility
  • File-level encryption for endpoint devices in an enterprise

Deployment Options Improving High Availability

The location of the UKC cluster nodes is determined according to the application architecture, locations of the users consuming services and regulatory compliance aspects.  The following figure depicts several possible topologies that allow to create an elaborate high availability scheme, such as locating the UKC nodes:

  • On-prem – in the DC and the DR sites
  • Hybrid – on prem and at the CSP
  • Single CSP – across different regions / availability zones
  • Different CSPs – a node per each CSP (at least)
Deployment Options Improving High Availability
Deployment Options Improving High Availability

In Short…

It is essential to create a coherent design for the required services availability which is appropriate to the particular business processes, and to match them to the how critical each of these processes is to the overall business mission of the organization. Based on this information a proper arrangement for high availability should be made, preventing downtime of crucial components of the service, such as key management and protection.

Unbound UKC allows application to enjoy superior availability, comparted only with the cloud service provider’s infrastructure.

George Wainblat

George Wainblat

George Wainblat joined Unbound in June 2017 as Director of Product Management. George brings a wealth of experience in leading multi-disciplinary product, engineering and business units at global hi-tech companies as well as startups.

Subscribe to BLOG

shares