Choosing a RDB for a Data Platform: a Journey Towards PostgreSQL (Part I)

post-oracle-vs-postgres

 

In a data Platform, there is an obvious need of a Relational Database. Though proprietary systems like Oracle are more widely used, open-source systems like PostgreSQL have gained popularity over the last years. This fact is supported by the strength of the open-source community, which has made possible the maturity process in order to get a set of features that can rival, and at times surpass, those of Oracle. The present document summarizes the capabilities of the product, with special attention to high availability (the main concern when talking about Postgres), and put them in comparison with Oracle.

Introduction

Aside of saving costs, advantages of open-source products include a higher speed of testing and improving, as well as the possibility of having a deeper knowledge. PostgreSQL has a 20-years-old community with a wide offer of professional support, an extremely talented 200-member team that leads the roadmap and hundreds of developers, architects and administrators who provide documentation, training, dissemination and peer-to-peer support. The result is a product which is cheap and robust, top in offered features and easy to use by developers and DBAs.

Feature Comparison

Security

Since security tends to have a custom treatment in a Data Platform, there is no need of going deep in this issue. Being able to reach RSL (row security level), both Oracle and PostgreSQL are integrable with Kerberos, Ldap and other standard security tools.

Database Administration

Backing up, restoring, replication and monitoring features are similar in both products, and both lack of a tool for Database-as-a-service, probably a must in next years.

Backing up and Restoring

pg_basebackup offers replication for the entire database cluster, since pg_dump can extract just one database into a file.

pgBackRest is a rather new tool for PostgreSQL (v.9.4) which performs efficient backup for terabytes databases. It offers compression, partial or failed backup resume, asynchronous archiving, full-differential-incremental and some other tricks. This tool gives also the backup restore and point-in-time recovery.

Oracle Secure Backup covers essentially the same.

Replication

Logical and binary (physical) replication can be done in Oracle and PostgreSQL. Logical replication is indicated for upgrading of version or for scalling out (adding nodes to a cluster) in some cases, while physical replication is best suited for disaster recovery.

Oracle Data Guard provides both kinds of replication, even though logical is better done with Oracle Streams or Golden Gate. In Oracle RAC, replication is done by the disk cabins, with a very high performance.

PostgreSQL has several tools for logical (slony, pglogical, which has a better performance and an easier setup) and physical replication (postgresFDW, pg_basebackup, rsync or repmgr), with a broad set of tuneable parameters to adjust outage time, performance, commit time, etc.

Binary replication is, of course, a must for achieving high-availability.

 Monitoring

There are several suites for monitoring Oracle and PostgreSQL clusters. However, databases inside a Data Platform are usually monitored by own tools.

Scalability

PostgreSQL 9.5 and next version 9.6 are rather focused in horizontal scalability. Postgres-XL is also integrated with version 9.5 and is born to provide scalling out capabilities. In Oracle, this is based in RAC (Real Application Cluster), which can get good performance but presents a single point of failure in the sharing disk.

Storaging

Between Oracle and PostgreSQL there are some differences in database size limits and data type limits, being the PostgreSQL ones a bit more generous.

In contrast, Oracle can provide column-oriented tables, while that’s a work in progress in PostgreSQL. But, since column-oriented databases are good for a high load of queries and not-so-good for writings, the improvement in performance can be covered by clustered architectures in PostgreSQL.

Flexibility

PostgreSQL is extremely extensible because its operation is catalog-driven. This means that information about tables, columns, data types, functions, etc is stored in tables, which can be modified by the user. Combining this fact with the community background, the result is an ever-growing stack of extended features in comparison with other relational databases.

Aside from that, PostgreSQL has 5 procedural languages as for now, in contrast with Oracle’s PL/SQL.

Loading

Load speed is something in what PostgreSQL has been improving recently. pgloader exceeded the previous COPY method, while pg_bulkloader  can get even higher load rates by skipping some sources of overhead, such as integrity errors.

Architectures based in multinode clusters also achieve higher speeds.

Compression

Compression in PostgreSQL is made with TOAST (The Oversized-Attribute Storage Technique), which prevents large field values to be stored directly. A threshold for what is considered a large value could be set.

Oracle Advanced Compression is probably a more versatile tool, since compression can be applied to any field regardless of the size.

However, compression has a negative impact in write operations, so it may not be a good idea to compress data when working with sizes under the Postgres threshold.

Isolation

There are several levels of isolation, from “read commited” (typically the default option) to “serializable snapshot isolation” (SSI), being “snapshot isolation” (SI) a middle option. Oracle’s multiversion concurrency control (MVCC) implementation gives SI, but anomalies could occur yet with this level of isolation.

Implementing SSI means to be able to detect and resolve conflicts between concurrent transactions, which could lead to data anomalies.

SSI can be implemented with strict 2-phase locking, but this implies blocks and poor performance.  PostgreSQL takes a different approach to ensuring serializability: it runs transactions using snapshot isolation, but adds additional checks to determine whether anomalies are possible.

Workload Management

Workload management of memory and disk can be done for databases in PostgreSQL, but community version has no tool for limiting consumption of resources by particular users. Enterprise DB Postgres Plus provides the feature, just as Oracle Resource Manager does.

Querying

The last version of PostgreSQL to date (9.5) introduced group by features like GROUPING SETS, CUBE and ROLL UP, which were already present in Oracle.

That version also brought the UPSERT sentence (INSERT… ON CONFLICT, which inserts a new row, or update it in case already exists). This can also get done in Oracle, with a much more complicated syntax.

Last, but not least, PostGIS is the most powerful tool for geospatial queries, overcoming Oracle Locator/Spatial.

Performance

Benchmarks give Oracle the edge in terms of query performance for single-node instances. Again, clustering appears here to turn the situation around, with solutions like Postgres-XL, which fit well with some use cases, like batch ingestion and read-only transactions.

NO-SQL

This is one of the biggest advantages over Oracle, given that Postgres’ JSON/JSONB types (available since v9.4) close the gap between RDBs and No-SQL. This not only lets to store unstructured data, but also gives the possibility to define inverse indexes, achieving complex searching capabilities.

Vendor Lock-in

As is well-known, Oracle licensing is quite expensive, while PostgreSQL is an open-source product. Moreover, enterprise support for PostgreSQL is offered for several companies, so vendor lock-in is also avoided in this sense.

mm

Raúl Alonso

As a physicist, I've always been interested on how things work. I started my career as a developer and analyst for a consulting company and in 2011 I signed for BBVA. After some years dedicated to project management, since 2015 I am part of the Data Architecture team, where I'm specialized in databases and DaaS.

More Posts