GOOGLE DATAFLOW AND APACHE BEAM (II)

GOOGLE DATAFLOW AND APACHE BEAM (II)

What is Apache Beam? Apache beam is an open source, unified programming model that defines and executes data processing pipelines. These pipelines can be both batch and streaming. It is exposed via several sdks that allow to execute a pipeline in different processing engines, aka, runners. The supported runners so far are: Apache Spark Apache […]

Google Dataflow And Apache Beam (I)

Google Dataflow And Apache Beam (I)

A bit of context first.. As some of you may know, in 2004 Google released the MapReduce paper that became the cornerstone of a whole new set of open source technologies composing the big data ecosystem as we know it (Hadoop, Pig, Hive, Spark, Kakfa, etc.). Meantime, Google followed its own path by developing other […]