This is the continuation of the comparison study between PostgreSQL and Oracle. In this part, we will explain the key-concepts of high-availability and the conclusions of the study.
Getting high-availability strongly depends on the use cases to be covered with the database. Read-only databases have to ensure reliability and minimize outage time, while transactional configurations must focus on consistency and no data loss.
Both Oracle and PostgreSQL have several tools to replicate and restore data, as well as recovering from failover. However, it is worth to point out that Oracle RAC presents a single point of failure, since data and the Oracle Webcenter Content configuration is stored in a shared disk. To minimize this risk, other replications via hardware (disk cabins) are done.
As replication with Oracle is done at the hardware level, the other election to take is about active-active (2 masters) vs active-passive (master-slave) modeling. Typically, at BBVA we have active-active for transactional use cases and active-passive for read-only access.
Similar configurations for HA are possible with PostgreSQL, although replication is made by software (streaming). For use cases with totally ACID needs, synchronous replication is necessary, causing a penalization in performance of the transactions. Fast networks help to minimize this.
Another approach is the asynchronous replication, which leads to better performance but could cause consistency issues. In order to evaluate the trade-off properly, some measures of replication time between nodes must be done.
Redundant configurations are typically recommended, with some nodes replicating synchronously and other asynchronously. It is worth to mention that synchronous replication duplicates the risk of an outage, because if one of the nodes goes down, there is an outage in the other. Some configuration parameters help to minimize the outage time: a list of nodes can be specified to replace the one that is down as synchronous replica.
Finally, there is the master-master vs master-standby decision. Master-master replication sounds like Holy Grail, but, actually, high-availability depends more on the synchronous vs asynchronous decision. Since synchronous replication is what provides no data loss, in a system with master-master synchronous replication there is not more (or higher) availability than in a master-standby one. The reason is that in the event of a failover in one of the master nodes, transactions won’t finish. That’s exactly the same case than in a master-standby synchronous replication. Thus, the difference is in the complexity of configurations, which is much lower in a master-standby model.
Next tables try to sum up advantages and disadvantages of each possibility:
|Asynchronous replication||Better write performance||Potential data loss
Needs conflicts resolution
|Synchronous replication||Full consistency (no data loss)||Worse write performance|
|Asynchronous replication||Better write performance
Potential data loss
Potential data loss
|Synchronous replication||No data loss
Worse write performance
|No data loss
Worse write performance
Hence, our conclusion is that the right architecture for HA needs has to include:
– One master node and several standby nodes.
– Synchronous replication to the hot standby node, which could resolve query requests. This is what ensures no data loss and minimizes the lag time on writes.
– Asynchronous replication to, at least, another standby node. Asynchronous replication gives a better performance on writes, and having this node is necessary in the case of a failover in one of the other two. If master node fails, the hot standby node is promoted to master and the warm standby node becomes in a hot standby one. If it’s the hot standby node who fails, a change in configuration is needed to set the synchronous replication to the warm standby node.
Using a model like the one shown in figure 3, we have performed a test consisting on shutting down the master node and measuring the time taken to promote the hot standby node. The status of the master node is checked every 3 seconds, but this time is configurable. Once the failover is detected, the promotion operation involves stopping the standby node, changing the configuration and re-starting the node. The next figure shows that the time measured for the whole operation is about 11 seconds.
HA is often measured as a number of “nines”, representing the percentage of outage in a period of time:
|Availability (%)||Downtime/year||Downtime/30 days||Downtime/week|
|“Two nines” (99%)||3.65 days||7.2 hours||1.69 hours|
|“Three nines” (99.9%)||8.76 hours||43.2 minutes||10.1 minutes|
|“Four nines” (99.99%)||52.56 minutes||4.32 minutes||1.01 minutes|
|“Five nines” (99.999%)||5.26 minutes||25.9 seconds||6.06 seconds|
With the results obtained in our test, if we suffer a couple of major failovers per month, we can still get a “five-nines” availability level.
The decision is about comparing features, performance and cost.
It is difficult to evaluate the real difference in the bill, but the feeling is that choosing PostgreSQL could cover the price of an enterprise support and yet save money. Of course, cost of migration must be taken into account, but in the long term there is an obvious saving.
The fact of saving cost is not enough to justify the decision, if it has a negative impact in the capabilities of the relational database.
There are features in which there is arguably a tie between Oracle and PostgreSQL, while in a few of them, one of the databases has the edge. It is worth to point out that not all the features have the same relevance and, in some cases, reasons are adjusted by the architecture of the Data Platform.
As it has been discussed previously, concerns about high-availability in PostgreSQL must be discarded. Yet, there are several combinations between the replication model and the number of master nodes in the architecture. Recommending one option over the rest depends on the uses cases but, for transactional needs, a synchronous- replication, master-standby model could be the best option.
Thus, it seems that choosing PostgreSQL as the relational database for Datio Platform is a good option, as well as an important, but not risky, bet. It’s just about walking the path ahead.
As a physicist, I’ve always been interested on how things work. I started my career as a developer and analyst for a consulting company and in 2011 I signed for BBVA. After some years dedicated to project management, since 2015 I am part of the Data Architecture team, where I’m specialized in databases and DaaS.