A vCloud Director setup consists of several components. At least required are vCD cells, a database and an NFS server. Depending on the features, additional components may be necessary, for example the AMQP Broker or a Cassandra database. In order to make this setup completely high available, each of these components must therefore be redundant. So, it’s useful to take a look at some basic HA considerations for the individual components.
Let’s start with the simplest:
The vCloud Director cells
It doesn’t take much to make the vCloud Director cells highly available. Only several cells need to be deployed and ideally made accessible via a load balancer. I already described in an older blog post how to configure an NSX Load Balancer for vCloud Director.
The cells in a cluster share the vCD database and the NFS share and use the same configurations, as for example the vCenter and NSX Manager connection, user sessions and everything else. But in a vCD cluster only one cell acts as vCenter proxy. If this cell fails, another cell takes over this role. The user doesn’t notice the failure.
This works extremely well and requires very little effort. To add more cells, you only have to specicy the existing vCloud Director database and the existing NFS share during the initial configuration of the new cell. Starting with vCD 9.5, the response.properties JSON file must also be located on this NFS share.
Basically, that’s it.
HA considerations for the database servers
High availability for the database servers is already more complicated. With the release of version 9.5 of vCD only PostgreSQL and MS SQL Server are supported as database backends. And MS SQL Server are already marked as deprecated and will not be supported anymore in the near future.
Postgres has already implemented several replication modes. These include streaming replication and logical replication. However, both are only suitable for active-passive replication with only one master node.
In order to build a multi master replication with automatic failover, there are some open source projects which enable a highly available Postgres Setup. At this point I’ll just call “PostgreSQL Automatic Failover” (PAF), which is based on pacemaker and corosync and the “patroni” project from Zalando.
Both solutions are pure community projects without support. And for me the question is always about the operational risk. What happens if something goes wrong with a multi-master setup and a split brain scenario occurs, for example? Do I want to risk that my customers lose some of their configurations and work from the last x hours? Everyone has to answer this question for themselves. In a smaller environment with few changes to VMs, networks and org settings, this may be less critical. In a large environment with thousands of customers and automatic provisioning of customers, this can have a much greater impact.
So, my recommendation is as follows:
When you build such a setup, you should consider the risks and what could go wrong. In addition, there should always be 3 independent pages and therefore 3 different quorums to keep the risk of a split brain scenario as low as possible. Furthermore you should have enough Linux knowledge to build and maintain such a setup. We must not forget: The database backend is one of the most critical vCD components.
I would also like to mention that there are commercial offers to build a highly available PostgreSQL cluster. An example for a very prominent provider: 2nd Quadrant. They offer with Postgres-BDR a solid solution with enterprise support.
As you can see from these considerations, an HA setup with Postgres is not really trivial. Maybe it is therefore more advisable for some people to continue using MS SQL as a database backend with the Always-On option and availability groups. This is proven, rock solid but also an expensive solution.
What about the NFS server?
In the HA considerations of the vCloud Director stack, the NFS server should not be forgotten. It is not a very important component, but if it fails, uploads are no longer possible for customers.
Also here, there are various guides on the Internet on how to build NFS server clusters with open source software (for example, here). Just like Postgres, these setups are based on the cluster stack of Pacemaker and Corosync. Sometimes natively with LVM and sometimes with DRBD. This works quite well, but in rare cases there may be “stale file handle” errors. This can happen if a file is currently being accessed by an application and the network connection to the NFS server is lost, for example during a failover.
Basically you should ask yourself if you want to take on the effort of a highly available NFS server with Linux. An alternative would be an enterprise storage solution (SAN or NAS) that supports NFS and can provide a share. Generally, these storage systems already contain enough reliability.
You could also consider whether the NFS server should be redundant at all. The impact for customers is rather small and if you virtualize this NFS server, you can quickly restore a backup or clone the original VM and start it manually in case of a failure.
How do I make AMQP Broker high available?
In short: It’s best not to.
Mostly RabbitMQ is used as AMQP broker software. The configuration of several RabbitMQ servers to a cluster is very simple at the beginning, but can also become very complex quickly. And additional pitfalls lurk in the operation of this cluster, too.
RabbitMQ does not handle network partitions very well. Therefore VMware recommends not to distribute RabbitMQ servers within a cluster across several sites (see here). The tricky thing about this split-brain scenario is that the individual servers continue to function, but messages are no longer processed properly. This results in unpredictable behavior.
But also when managing a RabbitMQ server there are some things to consider. For example, in the case of a shutdown of all servers, the last node that was powered off must be started first. Otherwise the cluster will not work anymore. Also with the configuration of the HA policies you can do some things wrong, so that messages are deleted, if the master queue dies suddenly.
Altogether the HA chapter in the RabbitMQ documentation is quite extensive and shows how complex this topic can become.
As so often, this raises the question how much complexity is necessary to reach a certain goal and what is the customer impact if we guarantee less or no high availability for this component?
This messaging component is actually only used to integrate extensions such as vCloud Availability or the vRealize Operations Tenant App, etc into vCloud Director. For some people, these are maybe essential services and they want to do the effort to make RabbitMQ high available. Many others don’t see a big impact if these additional services are not available for a moment.
The RabbitMQ server(s) also don’t store information permanently. They only forward messages. So if a RabbitMQ server fails, you can easily start a new server with the same IP settings and RabbitMQ configurations. An alternative to a HA setup would be to clone the active RabbitMQ server manually or replicate it frequently and use the second VM as cold standby machine.
Last but not least – The Cassandra Database
The Cassandra databases are used in vCloud Director Setup to store performance metrics. Personally, I don’t find this component mission critical, but other people may see it differently.
The good thing about Cassandra is that this database system is designed for scalability and high availability. That makes it easy to make this component redundant. You only need a corresponding number of nodes and should pay attention to the replication factor and the quorum consistency. For a node to fail without affecting the cluster and without a split brain scenario, you need at least 4 nodes (with a replication factor of 3 and quorum of 3).
VMware provides high availability only for the vCloud Director cells. We are responsible for all other components. It is therefore extremely important to consider the redundancy of these other components.
As smart engineers and architects, we should always ask ourselves the following questions:
- What is the customer impact if component $foo fails?
- How much effort does it take to build component $foo redundantly?
- How much effort does it take to maintain this highly available setup for component $foo?
- Is this effort in proportion to the benefits?
- Could this more complex setup create a higher operational risk?
- Does this potentially higher operational risk exceed the impact on customers if component $foo fails?
To summarize all these aspects:
- The vCD cells can easily be designed redundantly without higher risk.
- With a little effort, the database backend can also be designed redundantly. But you should take care to avoid split-brain scenarios. Otherwise, the operational risk would probably exceed the customer impact in the event of a failure.
- If you use a SAN or NAS system as NFS share, you can provide sufficient redundancy without much effort and higher operational risk.
- RabbitMQ clusters can be very error-prone. The benefit of a HA setup is usually in no relation to the higher operational risk.
- Cassandra databases can be easily scaled and redundantly designed. Apart from the high resource requirements, this does not result in a higher operational risk.