Choosing a RDB for a Data Platform: a Journey Towards PostgreSQL (Part II)

chess

 

This is the continuation of the comparison study between PostgreSQL and Oracle. In this part, we will explain the key-concepts of high-availability and the conclusions of the study.

High-Availability  

Getting high-availability strongly depends on the use cases to be covered with the database. Read-only databases have to ensure reliability and minimize outage time, while transactional configurations must focus on consistency and no data loss.

Both Oracle and PostgreSQL have several tools to replicate and restore data, as well as recovering from failover. However, it is worth to point out that Oracle RAC presents a single point of failure, since data and the Oracle Webcenter Content configuration is stored in a shared disk. To minimize this risk, other replications via hardware (disk cabins) are done.

As replication with Oracle is done at the hardware level, the other election to take is about active-active (2 masters) vs active-passive (master-slave) modeling. Typically, at BBVA we have active-active for transactional use cases and active-passive for read-only access.

Similar configurations for HA are possible with PostgreSQL, although replication is made by software (streaming). For use cases with totally ACID needs, synchronous replication is necessary, causing a penalization in performance of the transactions. Fast networks help to minimize this.

Another approach is the asynchronous replication, which leads to better performance but could cause consistency issues. In order to evaluate the trade-off properly, some measures of replication time between nodes must be done.

Redundant configurations are typically recommended, with some nodes replicating synchronously and other asynchronously. It is worth to mention that synchronous replication duplicates the risk of an outage, because if one of the nodes goes down, there is an outage in the other. Some configuration parameters help to minimize the outage time: a list of nodes can be specified to replace the one that is down as synchronous replica.

Finally, there is the master-master vs master-standby decision. Master-master replication sounds like Holy Grail, but, actually, high-availability depends more on the synchronous vs asynchronous decision. Since synchronous replication is what provides no data loss, in a system with master-master synchronous replication there is not more (or higher) availability than in a master-standby one. The reason is that in the event of a failover in one of the master nodes, transactions won’t finish. That’s exactly the same case than in a master-standby synchronous replication. Thus, the difference is in the complexity of configurations, which is much lower in a master-standby model.

Next tables try to sum up advantages and disadvantages of each possibility:

 

Advantages Disadvantages
Asynchronous replication Better write performance Potential data loss
Needs conflicts resolution
Synchronous replication Full consistency (no data loss) Worse write performance

 

Master-Standby Master-Master
Asynchronous replication Better write performance
Potential data loss
Highest availability
Potential data loss
Synchronous replication No data loss
Worse write performance
No data loss
Worse write performance
More complexity

 

Hence, our conclusion is that the right architecture for HA needs has to include:

– One master node and several standby nodes.

– Synchronous replication to the hot standby node, which could resolve query requests. This is what ensures no data loss and minimizes the lag time on writes.

– Asynchronous replication to, at least, another standby node. Asynchronous replication gives a better performance on writes, and having this node is necessary in the case of a failover in one of the other two. If master node fails, the hot standby node is promoted to master and the warm standby node becomes in a hot standby one. If it’s the hot standby node who fails, a change in configuration is needed to set the synchronous replication to the warm standby node.

 

img2

 

Using a model like the one shown in figure 3, we have performed a test consisting on shutting down the master node and measuring the time taken to promote the hot standby node. The status of the master node is checked every 3 seconds, but this time is configurable.  Once the failover is detected, the promotion operation involves stopping the standby node, changing the configuration and re-starting the node. The next figure shows that the time measured for the whole operation is about 11 seconds.

 

img1

 

HA is often measured as a number of “nines”, representing the percentage of outage in a period of time:

 

Availability (%) Downtime/year Downtime/30 days Downtime/week
“Two nines” (99%) 3.65 days 7.2 hours 1.69 hours
“Three nines” (99.9%) 8.76 hours 43.2 minutes 10.1 minutes
“Four nines” (99.99%) 52.56 minutes 4.32 minutes 1.01 minutes
“Five nines” (99.999%) 5.26 minutes 25.9 seconds 6.06 seconds

 

With the results obtained in our test, if we suffer a couple of major failovers per month, we can still get a “five-nines” availability level.

Conclusions

The decision is about comparing features, performance and cost.

It is difficult to evaluate the real difference in the bill, but the feeling is that choosing PostgreSQL could cover the price of an enterprise support and yet save money. Of course, cost of migration must be taken into account, but in the long term there is an obvious saving.

The fact of saving cost is not enough to justify the decision, if it has a negative impact in the capabilities of the relational database.

There are features in which there is arguably a tie between Oracle and PostgreSQL, while in a few of them, one of the databases has the edge. It is worth to point out that not all the features have the same relevance and, in some cases, reasons are adjusted by the architecture of the Data Platform.

As it has been discussed previously, concerns about high-availability in PostgreSQL must be discarded. Yet, there are several combinations between the replication model and the number of master nodes in the architecture. Recommending one option over the rest depends on the uses cases but, for transactional needs, a synchronous- replication, master-standby model could be the best option.

Thus, it seems that choosing PostgreSQL as the relational database for Datio Platform is a good option, as well as an important, but not risky, bet. It’s just about walking the path ahead.

mm

Raúl Alonso

As a physicist, I've always been interested on how things work. I started my career as a developer and analyst for a consulting company and in 2011 I signed for BBVA. After some years dedicated to project management, since 2015 I am part of the Data Architecture team, where I'm specialized in databases and DaaS.

More Posts

5 thoughts on “Choosing a RDB for a Data Platform: a Journey Towards PostgreSQL (Part II)”

  1. Hi

    If you talk about data replication in Oracle Terms that is Data Guard (instead of disk cabins) and not RAC. So I am not sure if comparing Postgres Replication and RAC is a good comparison. Data Guard is a Standby database and a primary database in Oracle supports up to 30 Standby destinations (some use these Standby destinations with Active Data Guard as a real time reporting Farm).

    In RAC if one nodes goes down the application downtime is measured in seconds and only 50% of sessions dies (in 2 nodes, if it’s 3 nodes RAC then 33%, 4 nodes then 25% and so on). I am not familiar with Postgress but I guess to promote a Standby node as the new primary takes longer time (11 secods in your post which is a great figure) and the outage is 100% until the new primary is promoted. It’s more similar with Oracle Data Guard, when primary dies the failover to the standby nodes takes longer. So comparing Data Guard with Postgress Replication is a more a orange – orange comparison.

    Data Guard is a Disaster Recovery solution. RAC is a HA solution.

    To have active-active, master-master replication is also possible in Oracle, that’s where GoldenGate comes into play. Master-master is a complex topic since we have to deal with Conflict Detection and Resolution (CDR).

    Finally cost is always going to be higher with Oracle due to the licenses.

  2. Hi, Lishan.
    Thanks for your comments.
    The comparison was about HA models in both RDBMSs. In Oracle, RAC provides instant replication of data at the hardware layer, so there is no outage except in the event of a broken disk. In Postgres, the model for HA is based in streaming replication, as explained in the post. In this model, a complete outage can happen, so the solution is to perform the failover as fast as possible (11 seconds in our PoC, but my guess is that we can improve it). Depending of the defined SLAs, even with several outages per year we can still call it “high-availability”.

    Both models are just different and the potential outage will depend on the nature of the fail. A broken disk in Oracle RAC could cause an outage (probably a long-duration one, except if a Data Guard based solution is developed). In a HA-model like the proposed for PostgreSQL, the possibility of an outage would depend on what node is affected (if it’s the async-replicated one, there would be no problem). Even with a broken disk in the master node, if the platform is able to self-provision a new instance, the outage will be probably shorter than in the Oracle RAC model.
    Of course, you are right about the better (not 100%) outage-level in Oracle RAC in the case of software failures (died nodes), what is actually something more probable to happen.
    I think that there could be arguably a tie between both models for HA, but even if we consider the Oracle one as the better (an understandable thought), it’s about deciding if it’s worth paying for it. It probably will depend on the use case.

  3. Hi

    I dont get the point where you say the data in RAC is “instant replication”. Data in RAC is not replicated, it is written to the disk. If you use ASM external redundancy the protection is in the storage RAID, in that case if a disk fails then the storage takes care of that. If you use ASM normal or high (Exadata) redundancy the I/O is done by ASM and it writes to 2 or more fail groups, if one failgroup (which can be one or more physical disks) and since the data is in another fail group we dont lose any data. In Exadata for example the data is stored in three cells (3 physical servers plus their physical disks). None is replication.

    If we talk about disk replication, it is based on Storage Mirroring (EMC SRDF for example), which is a binary copy block by block and this can be replicated in different Data Centers. The problem with Storage Mirroring is since it replicate binary data it can propagate data corruptions from the Primary site to the DR site, it does not happen often but it happens.

    Thanks

Deja una respuesta

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *